








MEYER ABRAHAM GIRSHICK 1908-1955 


By Davin BLACKWELL AND ALBERT H. BOWKER 


Meyer Abraham Girshick, a Fellow of the Institute and its president in 1952, 
died in the Palo Alto Hospital on March 2, 1955 at the age of 46. He was born 
in a small Russian village and came to New York City at the age of 15 years in 
1923. The principal of the elementary school he attended in New York was 
Angelo Patri who took a strong interest in the boy and helped him get into 
Columbia College in 1929. In 1932 he married Mary Knabel. In 1934 he entered 
graduate school at Columbia University to work with Professor Harold Hotelling 
who arranged a stipend from a Carnegie Foundation grant. 

Girshick left Columbia in 1937 to begin a very distinguished career in govern- 
ment service. For the next ten years he held positions in several government 
and government sponsored agencies including the Bureau of Home Economics 
and the Bureau of Agricultural Economics in the Department of Agriculture, 
the Statistical Research Group at Columbia University, the Bureau of the 
Census, and the Rand Corporation in Santa Monica. He joined the staff of 
Stanford University as Professor of Statistics in 1948. He is survived by his wife 
Mary and their daughter Paula. 

After he left Columbia in 1937, he undertook a pioneer study [3] of body 
measurements of 147,000 American children for the purpose of helping manu- 
facturers of clothing develop an improved system of sizing garments. At the same 
time he began a series of evening courses at the Department of Agriculture 
graduate school. Through these courses he attracted many research workers to 
the field and played an important part in promoting the use of sound statistical 
methods in the federal government. 

He moved from the Bureau of Home Economics, to become principal statis- 
tician in the Bureau of Agricultural Economics in 1939, a position which he left 
to join the Statistical Research Group at Columbia University. This period of 
activity at SRG had a decisive influence on his career. While there, he partici- 
pated in the development of sequential analysis and wrote his two most im- 
portant papers [10], [11] in this area. During this period he became acquainted 
with and immediately recognized the importance of the new and more sophisti- 
cated decision theory models for statistical problems being developed by Wald. 
From about 1946 most of his work was explicitly formulated in terms of loss 
functions and other decision theory concepts. This interest was reinforced by his 
work in games at the Rand Corporation, and a major portion of his time in 
recent years was spent in an effort to clarify and extend the basic results of de- 
cision theory [23]. Girshick soon found himself surrounded by students, junior 
colleagues, and others in the University who sought his advice, counsel, and en- 
couragement in their work. 

At the time of the Korean war, Girshick organized a military research group 
at Stanford with the sponsorship of the office of Naval Research. His leadership 
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of the applied statistics group at Stanford has long been considered a model of 
University participation in military research, and through his efforts many 
University scientists have been able to contribute directly to difficult problems in 
theoretical statistics of military interest. His intellectual leadership in both the 
Statistics Department and projects, and enthusiastic interest in scholarly work 
were major factors in the growth of Statistics at Stanford. Most of the work 
produced by the Statistics Department represents his ideas or his spirit. 

At the time of his death, he was exploring the role of invariance in statistical 
problems, an interest reflected earlier in [18]. This work was continued actively 
at Stanford University and became one of the major themes of research in the 
growing Statistics Department. 

Girshick was notable for his receptivity to new concepts (sequential analysis, 
decision theory, game theory, invariance), his tremendous energy and drive, 
the wealth of new ideas and conjectures he produced, and his persistent and 
usually successful efforts to get others to work in directions he considered fruitful. 
His influence in statistics was at least as much through the impact he had on all 
who came in contact with him as through his own writings and will be felt for 
a long time. 
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THE TREATMENT OF TIES IN SOME NONPARAMETRIC TESTS! 


By JoserH PuTTER 
University of California, Berkeley 


1. Introduction. Most of nonparametric testing theory is usually presented 
under the assumption that all the samples involved are drawn from continuous 
distributions, and that tied observations can therefore be ignored or treated in 
any convenient way, without affecting the performance characteristic of the 
test. In practice, however, this assumption is not a realistic one, and the dis- 
tributions involved are in general to be regarded as discontinuous, either because 
of intrinsic reasons (integer-valued or otherwise discrete random variables) 
or because of limitations on the precision of measurements. Therefore, usually, 
ties will occur with positive probability, and the way they are treated does affect 
the performance characteristic of the test. The problem of ties has therefore to 
be considered, in particular with a view to preserving the nonparametric charac- 
ter of the test, and to making sure of setting it up on the desired level of signifi- 
cance. 

The usual practice in attacking the problem has been to consider the condi- 
tional distributions of the statistics concerned given that the number of observa- 
tions in each tied group is a fixed constant. This, however, was never explicitly 
made clear, and these conditional distributions, as well as their variances and 
other characteristics, are referred to as distributions (or variances, etc.) “when 
ties are present.” In this category belong Kendall’s work on ties in rank correla- 
tion theory, and Kruskal’s theorem concerning a generalized Wilcoxon test (see 
Section 8). 

In this paper, we attack the problem from the standpoint of the ties being 
random variables. Our main concern is the comparison between the ‘ran- 
domized” and the “nonrandomized” way of treating the ties. In Sections 3 and 
4 we consider the one-sided sign test, and show that randomization reduces 
both the exact power and the asymptotic efficiency of the test. In Sections 5-8 
we consider the Wilcoxon test. For small samples the nonrandomized treatment 
of ties presents practical difficulties, but the asymptotic (large sample) problem 
can be handled. Again, it is shown that randomization results in reduced effi- 
ciency. 


2. Notation and theorems used. We shall use the notation 9(a, b) for normal 
random variables (with mean a and variance b), and @(n, p) for binomials. The 


symbol _-, will denote convergence in probability, and _%, convergence in 
law (convergence of distributions). 
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To compare the asymptotic performances of two consistent tests, we shall 
use Pitman’s concept of asymptotic relative efficiency. The concept is presented 
in Pitman’s lecture notes and also by Noether [11]. In particular, we shall use 
the following theorem. 

TuHeoreM A (Pitman, as quoted in Noether [11], pp. 241-242). Let H be a 
hypothesis specifying the value 0 of a population parameter 6, and A the one-sided 
alternative @ > 09. Let {rin}, * = 1,2; = 1, 2,---, be two sequences of tests 
of H against A, on the same level of significance a. Let rin consist of rejecting H 
when Sin > kin, where Sin are statistics and ki, appropriate constants. Let ;,(0) 
and oin(@) be functions such that pin(0) exists in the neighborhood of 6, and let 
the following conditions be satisfied asn — ~: 


a On a 
(2.1) . et — 1, 6, = + nia aa positive constant; 
in\Y0 


oin(n) — |: 
0 in(o) 7 
Hn) Hn) = Vini&) 


(2.3) oin(O0) ; 


c; a positive constant; 


and either 


(2.4) Sin — Vin(9) L 


706) > (0, 1) 


uniformly in 0 in the neighborhood of > , or 


(2.5) Sin — Vin(On) _L (0, 1). 

o in(On) 
Then the asymptotic relative efficiency of {t2n} with respect to { t1n} is littns« H3(n)/ 
Hi(n). 

(Noether defines yin(@) = E(Sin| 0) and oin(0) = o(Sin| 0), but it is easily 
seen from Pitman’s proof of the theorem that this specification is not neces- 
sary.) 

To handle the uniform convergence required in condition (2.4), we shall use 
the following theorem. 

TueoreM B (Parzen [12], p. 35). A necessary and sufficient condition for a se- 
quence of distributions F, = F% to converge to a distribution F uniformly in 6 
is that 


(2.6) {2° > f@ uniformly in 6, 


where f\” and f denote the respective characteristic functions. The convergence 
(2.6) is then jointly uniform in 6 and t for every finite t-interval. 
(Theorem B is a particular case of Parzen’s Theorem 7c.) 


THE SIGN TEST 


3. Randomized and nonrandomized test. Let Z:,--- , Z, be independent 
and identically distributed random variables. Denote the number of positive 
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Z,’s by N, of negative Z,’s by N_ , and of zeros among the Z,’s by No. 
sign test consists of rejecting the hypothesis 
H: P(Z, > 0) = P(Z, < 0), 
against the alternative 
A: P(Z, > 0) > P(Z, < 0), 


say, whenever N , is too large. 

In practice Z, frequently is of the form X, — Y;, where X; and Y;, are inde- 
pendent. If the distribution functions of X; and Y; are continuous, then P(Z; = 
0) = 0. In this case, under the hypothesis, VN, is @(n, 3), which gives us the cut- 
off point. 

In the general (discontinuous) case, denote 


Pi, >O|H)=ps, PZ =0|H) =p, 
P(Z, > O| A) = 44, P(Z, = O| A) = @, P(Z, <O|A) =q@-_. 
Consider the conditional distribution of N, given that No = m. Under H, 


P(N, = z| mm) = px(z) = (" - :) qyr. 


under A, 


Pate eln= add ol" 3 "\rEs) (E): 


0,1, ---,n — nm. Thence 


pa(x) _ (*) 
aia c(no) ape 


which is a strictly increasing function of x. Therefore, by the Neyman-Pearson 
lemma, the unique most powerful test based on N and Np is given by 


(3.1) Nz > k(No), 


where the cutoff point k(no) is, of course, the one corresponding to ®(n — no, 4). 
It is obvious that k(No) is not a linear function of No . Thence the test (3.1) 
does not coincide with the test 


(3.2) Ny + 3No > k, 


which was proposed, e.g., by Dixon and Mood [3]. In fact, the distribution of 
N. + 4No under H depends on the unknown parameter po , so that the cutoff 
point k cannot be well defined. The usual practice seems to be to take for k the 
cutoff point corresponding to @(n, 3). This, as was shown by Hemelrijk [4], 
results in lowering the level of significance and consequently also the power of 
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the test. However, the difficulty caused by the dependence of k on pp can be ob- 
viated when asymptotic properties are considered, and we shall return to the 
matter in the next section. 

The test (3.1), which amounts to “omitting the ties from the observations” 
and which was suggested, e.g., by Dixon and Massey [2], is, as we have seen, 
the unique most powerful test based on NV, and No . However, another customary 
procedure is one based on “‘randomization”’: after observing the Z,’s, we perform 
No independent random experiments, assigning each of the No zeros among the 
Z;’s a positive or negative sign with equal probabilities (= 4). We thus get, say, 
N%, additional positives. The random variable Nf = N, + Nj is, under H, 
@(n, 4), and we can apply the test 


(3.3) Ni >k 


without worrying about the unknown pp . 

Consider, again, the conditional situation given that No = mn. Denote by 
p(y) the frequency distribution of @(m , 3). The joint (conditional) frequency 
distribution of N, and N4 is px(x)p(y) under H, and p,(x)p(y) under A. The 
ratio of the two expressions is pa(x)/puy(x), so that (3.1) is also the unique most 
powerful test based on N,, No, and NY. We have thus proved the following 
theorem. 

THEOREM 1. The nonrandomized test (3.1) is uniformly more powerful (against 
the one-sided alternative A) than the randomized test (3.3). 

As a numerical example, we give in Table I the powers of the two tests for n = 
10, against the alternative q,/q_ = 2. Since the power of either test depends on 
qo , we tabulate the conditional power given No = no, for all values of m . The 
tests are considered on the .05 level. To keep this level exact (and to get a valid 
comparison between the tests), we modify the tests in the usual way. For ex- 
ample, the test (3.3) is now formulated as follows: Reject H with probability 1 
if N? > k; reject H with probability ¢ if Nf = k; accept H otherwise. In our 
particular case, k = 8 andg = .893. 


TABLE I 


ne . conc @ fa | 2 | s | | s | > | 
— Ne | | | —s 

Power of eatimatae test (3.3)....... 278). a, 208) . 177). 150). 127). 106, i 074) 061] .050 
Power of nonrandomized test (3.1)... .| |-278). 244). 232). 216). 184).171). 158). 119) .088 -067) .050 


s[> | 








In particular, against the alternative gq = g- = 3, q+ = 3, the power of (3.3) 
is .195, while that of (3.1) is .221. 


4. Asymptotic properties. For large sample sizes n, it is convenient to use the 
normal approximation to the binomial, and we shall now compare the per- 
formances of the randomized and nonrandomized tests when this approximation 


is used. 
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For the randomized test statistic N{ we have, under H, 


2Ni—n 


(4.1) a 


/, x0, 1), 


which gives us the usual normal approximation to (3.3). Under A, N{ is 


@(n, q+ + 340), 
and hence 


_ NF -—nG +4m)  L 
(42) wes + FGF fool D- 


It is easily seen from (4.1) and (4.2) that the test (3.3) is consistent against the 
one-sided alternative A. 

It is more difficult to derive the normal approximation to (3.1). Since in any 
case the normal approximation of a test is not an obviously definable concept, 
we shall derive a nonrandomized asymptotic test by starting from (3.2). The 
joint distribution of N., No, and N_ is trinomial; hence the test statistic 


N. + = WN + + 4] 0 
is asymptotically normal. More precisely, under H, 


2N.—n L 
a — pope 00. 
; P 
Since No/n ——> po, we have 
2Ne—n L 
(n — No)? ee 9(0, 1), 


which gives us an asymptotic test independent of po. Under A, 


(4.3) 


ase Gia. + ine +in —i* 


and, again, the nonrandomized test corresponding to (4.3) is consistent against A. 

We now compare the asymptotic performances of the two tests in terms of 
Pitman’s concept of asymptotic relative efficiency. In the notation of Theorem 
A, put 


@=qit+4q, % =}. 


THEOREM 2. Let {Ag, 0 > 4} be a family of alternatives for which qo = po. 
Then the asymptotic relative efficiency of the randomized test (3.3) with respect to 
the (nonrandomized) test based on the statistic 


2N4—n 


= Nae 


isl — po. 
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Proor. Put 
Sin se T,(1 an po)”, San pine Ni, 
Vin(8) = (20—1)n™?, an) = 08, 
oin(0) = (40(1 — 0) — po)”, — aan(0) = (nO(1 — 0))*”. 
Conditions (2.1)-(2.3) obviously hold, and we proceed to verify (2.4) and/or 
(2.5). 
For i = 2, the convergence (2.4) holds by (4.2). From the usual proof of bi- 
nomial convergence to the normal, it is easily seen that the corresponding con- 


vergence of characteristic functions is uniform in @ in the neighborhood of 4, 


and hence, by Theorem B, so is (4.2), and condition (2.4) holds. For i = 1, we 
have 


Sis a Vin(O) on ___2Ny = 6n) (a= ey 
o1n(8) (n[40(1 — 0) — pol)'? \ n — No 


n n(l — pr) 1/2 = 1/2 i 1 ) 
eee (sat = =) ((; - =) — py): 


2(Ns — On) L 
@laed = 8) = py? 1) 


by (4.4), and this convergence, as before, can easily be shown to be uniform in 6 
in the neighborhood of 4% . We have 


n(l ae ey: P ( n }- ue 1 _ 
( roe 1 and oa an pe 0 


independently of 6, and 


nl om a) 1/2 
an, — (gaa s me) +20 


Hence condition (2.5) holds. Our result now follows from Theorem A. 


Now, 


THE WILCOXON TEST 


5. Notation and known results. We shall use the following notation in con- 
nection with the Wilcoxon test. (X;, --- , X,) is a sample of n independent ob- 
servations from a distribution F(z), and (Y:,-+-- , Ym) is a sample of m inde- 
pendent observations from a distribution G(z). If all the m + n observations in 
the pooled sample are different, we rank them in ascending order of magnitude, 
assigning the rank 1 to the smallest observation. We denote by Sy» the sum of 
the ranks assigned to the X’s. The Wilcoxon test of the hypothesis F = G con- 
sists of rejecting the hypothesis when S,,, is too large. 

The mean, variance, and asymptotic distribution of S,, in the case when F 
and G are continuous (and therefore the probability of getting two or more 
equal observations is 0), are known, and are summarized below. 
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When F = G, every possible ordering of the pooled sample occurs with the 
same probability 1/(n + m)!, and the distribution of S,,,, can be derived from 
this fact alone. We shall denote any statistic with this probability distribution 
by S‘.,,. From Mann-Whitney [10] we have 
(5.1) ni a a 


9 = Unm, Say; 


(5.2) a (Som) = nm(n + m + 1) = » say; 


12 
Son — Bnm 


Tnm 


1 1 


(5.3) Tt, = He) > abt 4b 0. 
n m 


In general, when F and G are any two (continuous) distributions, we have, 
from Mann-Whitney [10], 


(5.4) ESam = nm + nmé, 
(5.5) 6 = FG) = P(X: > ¥) -3= [ Ge) are -3, 


a (Snm) = onm + nm{[(@ — ri)(n — 1) 


5.6 
(5.6) + (0 — d)(m — 1) — O(n + m — 1)], 


(57) m= (FG) =4- [ Fe) aa, 


(5.8) = (FG) =4— ff — GP are). 


When n — « while m/n = c is held constant, we have, from Lehmann [9], 


San — ESan 
o(Snm) 
(That o(S,m) is the correct norming factor can be seen from Hoeffding [6], 

Theorem 5.2.) 

For the case of discontinuous F and G, which we shall consider in the following 
sections, we adopt the following notation. We assume the common discon- 
tinuities of F and G (which are the only ones that matter) to be finite in number, 
and denote them by &, k = 1, --- , K. Their locations are not assumed known, 
and are irrelevant to our considerations. We define 


pe = P(X, = &), @ = P(Yi = &); 
U, = the number of X’s which are equal to & ; 


(5.9) -, 0,1). 


V, = the number of Y’s which are equal to &; ; 
Wi. = U.+ Vi; 
U = (U,,--: , Ux), V = (Vi, ---, Vx), W = (W,,--:, Wr). 
We shall write >>, for 0h. 
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6. The treatment of ties. When F and G are continuous, the probability of 
getting tied (equal) observations is 0, so that this event may be ignored. In the 
discontinuous case, however, ties occur with positive probability, and when 
they do occur, the pooled sample can no longer be uniquely ordered. The prob- 
lem arises, therefore, of how the Wilcoxon test is to be defined in such a case. 

An obvious solution to the problem, proposed by many writers on the subject, 
is, again, ‘“‘randomization’’: each group of equal observations is ordered at ran- 
dom, giving every possible ordering (within the group) the same probability. 
This results in an ordering of the pooled sample, and the sum of the ranks of the 
X’s can now be defined. The only difference from the continuous case is that this 
new random variable is defined over a different sample space, because its value 
depends not only on the observed X’s and Y’s but also on the outcome of the 
randomization procedure. We shall denote this sum of the ‘‘randomized’”’ ranks 
of the X’s also by Sz» . 

Again, if F = G, every possible (randomized) ordering of the pooled sample is 
equally probable, and hence S,,m is distributed as S°,,,. The Wilcoxon test can 
therefore be applied using S,,,, with the same cutoff point as in the continuous 
case. The main objection to this procedure seems to be that the outcome of the 
test (rejection or acceptance of the hypothesis) is thus made to depend not only 
on the observations but also on an additional, and more or less irrelevant, ran- 
dom experiment. We are thus led to look for a test which is 

(i) distribution-free under the hypothesis; 
(ii) dependent on the observations only; and 

(iii) as close as possible to the original Wilcoxon test. 

We leave the precise meaning of this last requirement unspecified for the mo- 
ment, and shall elaborate the point later on. 

For the remainder of this section, we shall need to consider only the case when 
F = G, and it will be convenient to assume that F is purely discontinuous. 
In this case, the ordering of the pooled sample is given by the nonzero com- 
ponent of the two vectors U and V, as long as the observations alone are consid- 
ered. Hence any rank (order) statistic which depends on the observations only 
can be expressed in terms of U and V. Requirement (ii) means, therefore, that 
the rejection region R of the test will be a region in the 2K-dimensional sample 
space of the random vector (U;,---, Ux, Vi,--:, Vx). 

In this sample space, the vector W is a sufficient statistic for the vector param- 
eter (pi, °*+ , Px), i.e., the conditional probability 


P(u|w) = P(UU =u,V =w-—ul|W =w) 


n! y m! (m + n)! 


Uy! s+* Ug! (wy — wm)! +++ (we — uUx)!/ wi! +++ we! 





is independent of the p,’s. Hence, if the size a of R, that is, 
P(R) = > P(W = w) P(R| W = w) 


(m + n)! wi. wK 


pi' +++ px > Plu w), 


w w,! “4 Wr! (u,w—u)eR 
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is to be independent of the p,’s (requirement (i)), we must have P(R | W = w) = 
a for every w, which is the usual condition on distribution-free tests when a 
sufficient statistic with a complete family is involved. But since for every fixed 
w we have only a finite set of probabilities P(u | w), and these sets vary with w, 
it will in general be impossible to find a region R with exact size a. However, 
this difficulty can be obviated, e.g., by considering regions which include some 
sample points not definitely but with certain given probabilities. Thus it appears 
that some random element outside the observations is unavoidable, unless we 
do not insist on the exact size a. But in practice this consideration is unimportant, 
because one is usually quite content to stop just short of the given size a. 

Suppose that various regions R of the required type and of exact size a (pro- 
duced by the above, or any other, device) are available. Denote by Rp the re- 
jection region [S,, > a], of the same size a, given by the ‘“‘randomized”’ Wilcoxon 
test. Then Rp is defined in a different sample space, which can be described as 
the result of splitting each point of the (u, v)-space into several points corre- 
sponding to the possible outcomes of the randomization procedure. We shall view 
the sets R as sets in this “extended” sample space, too. 

We have P(R) = P(Ro) = a, or P(Rm Ro) = P(R nm Ro), where the notation 
A stands for the complement of A. Now, one possible interpretation of require- 
ment (iii) above is to choose R so as to minimize P(R n Ry). This may be justi- 
fied as follows. Suppose F is really continuous, and the ties occur only because 
of insufficient precision of measurement. The randomized test is, in a sense, ap- 
proximately equivalent to the (Wilcoxon) test which we would use if our meas- 
urements were precise, because the effect of the randomization procedure is 
similar to the effect of replacing each discontinuity by an interval of uniform 
distribution (cf. Section 7). It is therefore reasonable to try to minimize the 
probability of getting a result (rejection or acceptance of the hypothesis) dif- 
ferent from the result of the randomized test. But this probability, when the 
hypothesis is true, is P(R n Ryo) + P(Rn Ro) = 2P(Rn R). 

We thus want to minimize P(R n [Sam < a)), which will be achieved if we 
minimize 
P(Ra[Simsa|W=w)= Dd Plulw)P(Ssal|U =u,V =w— 1) 


(u,w—u)eR 


for every w. This is to be done under the condition 


(6.1) 2d , Ptu|w) = P(R|W = w) = «. 

In a manner analogous to the proof of the Neyman-Pearson lemma, it is easily 
seen that the “optimum” region is obtained by the following procedure. For 
every vector w, we order all the possible vectors (u,v) = (u, w — u) by the mag- 
nitude of P(S Ss a| U = u, V = w — u). We take that vector (u, w — u) for 
which this probability is smallest, then that vector for which it is the next 
smallest, etc., until the (conditional) size a, as in (6.1), is reached. Doing this 
for all w, we get the desired R. 
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Unfortunately, the tabulations required for this test are much too extensive. 
We can approximate the test if, instead of rejecting the hypothesis when P(Sam S 
a|U = u, V = w — u) is too small, we reject it when E(S,,|U = u, V = 
w — u) is too large. The two tests will probably not differ too much. 

The statistic 


(6.2) hie eg E (Sam | U, V), 


where E, denotes expected value under randomization, is the sum of the mid- 
ranks of the X’s, where the midrank of an observation is defined as the mean 
rank of all the observations equal to that observation, or, more precisely, 


midrank (X) = “amt! ; 

where N, is the number of observations smaller than X, and N, is the number 
of observations (including X) not larger than X. The statistic S;,,, has been 
proposed by many writers as a test statistic to replace S, when ties are present. 
However, by the preceding considerations, the cutoff point has to depend on W, 
and the tabulation involved is prohibitive. A few cases have been tabulated in 
[13], but they can merely serve as an indication of the task involved in more 
exhaustive tabulation. 

Kruskal [7] derived the conditional asymptotic distribution of Si» given 
fixed W = w(n, m) which fulfill a certain convergence condition (cf. Section 8) 
for the case F = G. In the next two sections we shall derive the (unconditional) 
asymptotic distribution of S;,,, in general, and discuss some consequences. 


7. The asymptotic distribution of S’,,. We now drop the assumption that F 
and G are purely discontinuous. Consider the conditional distribution of Sym 
given a fixed pooled sample of X’s and Y’s. For this fixed sample, let U = u, 
V = v. Denote by r the sum of the ranks of those X’s which are not equal to 
any & (and which are therefore, with probability 1, untied), and by r, the num- 
ber of those observations (X’s and Y’s) which are smaller than &; . 

Under the randomization procedure which generates S,,, , those observations 
which are equal to & are assigned the ranks r;, + 1, r, + 2, +++, Te + Ue + m% at 
random, with every ordering equally probable. Hence the sum of the ranks of 
those X’s which are equal to & is wrx + Si,0,, and 


Sam = 1+ Dox (were + Siz ns): 
Therefore, by (6.2) and (5.1), 
Sam = 1+ Dir (tate + bus.on)s 
Sam = Sam + Dor (Suson — Hus.en)s 
where the K + 1 terms on the right are (conditionally) independent. Since this 
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holds for every fixed sample, we can write 
(7.1) Sam = Run + De (Svx.ve — Bue.ve)s 


where the terms on the right are conditionally independent given U and V. 
Obviously, we have 


ESim = EE,(Sam| U,V) = ESan. 
To calculate o (Sam), we note that, by (7.1), 
o°(Sam) = E(Sam — ESnm)* 
EE,{(Sam — ESnm)” | U, V] 
EE,{(Snm — ESam + Dox (Sve.ve — ues.v,)) | U, V} 
= 6°(San) + ty De EULVi(Ur + Vi + 1) 


o (Sim) + sal » pequl(n — 1)pe + (m — 1)qe + 3), 


or 
(72) 0 Sam) = 0°(Sam) — 3 > pegel(n — 1)pe + (m — 1): + 3). 
In particular, when F = G, 


(7.3) (Sea) @ Ce — a L pil(n + m — 2)px + 3). 


Of some interest, when F = G, is also the conditional variance o°(Sim |W). 
Since the conditional distribution of S,m given W is still that of S%,,, this vari- 
ance can be computed in a manner similar to the preceding argument, giving 
(when F = G) 


(74) o(S..|W) =o. — — 


y 72 =. 
12(n + m)(n + m — 1) a W (Wi 1). 


This is the variance given by Kruskal [7], and in a more cumbersome form by 
Hemelrijk [5]. 

Since E'S,» and o°(S,) as given by (5.4) and (5.6) refer to the continuous case, 
we shall touch on the modifications required for discontinuous F and G. By 
Lemma 5.1 of Lehmann [9], there exist two continuous distributions F* ari G* 
under which the distribution of S,, is the same as under F and G. These continu- 
ous distributions are obtained, essentially, by replacing the discontinuities by 
intervals of uniform distribution. We define 


9* = o*(F, G) = 0(F*, G*), 
MF = AF(F, G) = d,(F*, G*), 
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referring to the definitions (5.5), (5.7), and (5.8). From (5.4) and (5.6), we now 
have 


(7.5) ESam = Mam + O*nm, 
o' (Sam) = gam t nm|(e* — AT)(n — 1) 
+ (6* — »F)(m — 1) — O(n + m — 1)}. 


For later use, we compute 6* in terms of F and G. Denote by B the real line with 
the points & excluded. We have 


(7.6) 


ot+a= [ore are =f o@ar@ +20 Ge - 0) + lp at 


[ ce dF(z) + dp G(é& — 0) + 4 Lo page = P(X, > ¥;) 
+ 4P(X, = Yi), 


6* = P(X, > Vi) + $P(X = Vi) — 3 


2: 

We now give a theorem connecting the asymptotic distribution of Sim with 
that of Sam. Note that the symbol o%,,v, will stand for py UirVi(Us + Vi + 1), 
as in (5.2), and will have nothing to do with the variances of U; and V, . This 
refers, of course, to all the symbols with U; and V, as subscripts. 

Tueoreo 3. If, for a pair of distributions (F, G), and possibly under some re- 
strictions concerning the relation between n and m, we have 

Bus — E. nm L 
(7.8) Sam — ESrn — (0, b’), 
Tnm 

i P 
79) core Fog 

Tnm 
as 1/n + 1/m — 0, then, under the same conditions, 
Sam — ESnm 


Tnm 


/. x0, 6°), 


where 


b° = b — ei bj. 


Proor. Subtracting ES, from both sides of (7.1) and dividing by onam , we 
have 


Uk.Ve mp0 
ne Tanta s 


Tnm Tnm Tnm 


(7.10) Sam — ESam _ Sam — ESnm f- 2 _ 


where 7” is defined by (5.3). The U; and V; are @(n, px) and @(m, qx), respec- 
tively. 
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Let d > 0 be a fixed number which we shall specify later, and define 


nm 


Ren = | (2 — p| <d, ee - 0 <d, poet — b, < d, atk |. 
We have, by (7.9), 
(7.11) P(Ram) > 1 as 1/n+1/m—>0. 
Define 
h(t) =e”, 


ham(t) = the characteristic function of T°. m; 


fam(t) = the characteristic function of Sum 


fim(t) = the conditional characteristic function of Sum 
V =»; 


Jnm(t) = the characteristic function of Sam — ESnm 


g:*,(t) = the conditional characteristic function of — 
V =»; 


A =|] h(t,), oa =, 
k 


Tnm 


All integrals will be taken in the (u, v)-space, with respect to the probability 
measure in that space. 

By (7.10), we have 
fam(t) 


gam(t) - Ik A Mk », (tax) . 


Jnm(t) _— h(tb) F 7 Vall _ h(tb)] 


+ Kosa [TT abs) — TT haart) J. 


Hence 


aan(t) — W0t)| = |f torn) — n(e)] 
sf loin — nee +4 |f tre - nw 


. A FL: | I h(tb,) mr I huyyo, (tay) | ° 
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Using the definition of R,» , the property (7.11), the condition (7.8), and the 
fact that, by (5.3), Anm(t) — A(t), each of the three expressions involved can be 
shown to converge to 0 as 1/n + 1/m — 0, and the theorem is proved. 

8. Consequences of Theorem 3. Since the asymptotic distribution of Sym 
is known, Theorem 3 enables us to investigate the asymptotic behavior of Sim 
and to compare the tests based on the two statistics. 

THEeoreM 4. If F = G, then 

Sam — tom Log ed ¥ pd, ast jetted, 
Tnm k nr ™m 
Therefore, if 8nm = Snm(U, V) is any sequence of positive statistics satisfying 
Sam P 
> —tI1-— x Ph ’ 


Tnm 


a oa 


Tam = Ham LS (0, 1) ast 4b 
n m 


Snm 

Proor. By (5.9), we know that (7.8) holds with b = 1. We also have 
Tors _ U,ViCW, + 1) 
Sas nm(n + m + 1) 


_ U,V. Wi, n+m-— W, ) P 3 


——> 


UeVa( Me 4 a 
Hence the theorem follows from Theorem 3. 

Theorem 4 gives us test statistics whose asymptotic distributions, under the 
hypothesis F = G, are independent of F, and which can therefore be used to 
obtain asymptotically distribution-free tests. The rejection region of such a test 
will be [Tm > a], where a is given by 


(2_)7!” | fe a. 


We shall refer to tests of this type as “the nonrandomized tests,” and to the 
Wilcoxon test, based on the randomized S,» , as “the randomized test.” 
Convenient choices for the norming factor s,m are given, e.g., by 


(8.1) Rix _ Sout [ vs Dt U.Vi.( Wi a 1), 
or 

ale Reig ae Rails, total eo. 
(8.2) Snm >= Tnm 12(n + m)(n +m — 1) X W.(Wi 1) . 


The norming factor given by (8.2), which is, by (7.4), the conditional standard 
deviation o(S,m|W) under the hypothesis, was suggested by Kruskal and 
Wallis [8]. In this case, Kruskal [7] proved that the conditional distribution of 
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T.m (given W) tends to 91(0, 1) if the W’s are fixed vectors such that Sam/onm 
converges to a positive limit. 


We now turn to the case F ~ G (the alternative of the test). In the continu- 
ous case, it has been shown by Mann-Whitney [10], van Dantzig [1], and Leh- 
mann [9] that if m/n is held constant, then the Wilcoxon test is consistent (i.e., 
its power tends to 1 as n — ©) against all alternatives under which P(X, > 


Y,) > 3. We proceed to derive the analogous consistency property for the dis- 
continuous case. 


THEOREM 5. Let m/n = c be fixed, and 
(8.3) P(X, > Yi) + P(X = Yi) > 3. 
Then the randomized test is consistent. If, moreover, the norming factor Snm satisfies 


(8.4) Sam _P n>, 


Tnm 


then the nonrandomized test is also consistent. 


RemaRK. The condition (8.4) is always satisfied if s,,,, is defined by either 
(8.1) or (8.2). For (8.1), we have 


1 
B= 5. sn debe - PeQe(Pr + CQ) 


and for (8.2), 
1 


W=l-apyL + ow)’, 


k 


and both quantities are positive, unless F and G are both degenerate and identical, 
which is obviously impossible under (8.3). 

Proor oF THEOREM. By (5.9) and Lemma 5.1 of [9], we have 
(8.5) — aes 92(0, b”), 
where b = lima... o(Snam)/onm is, by (7.6), a function of c and of the parameters 
6*, \I , and A? . The rejection region of the randomized test is [Tm > a], where 
Tam = (Sam — inm)/Onm- But, by (7.5), we have 


Sam — ESam - 6*nm 


, 


Ton = 


Tnm onm 


and, by (5.2), nm/onm— © asn + ow, Hence, by (7.7), the randomized test is 
consistent. 


Also, from (8.5) it follows, by Theorem 3, that 


, x0, 6), 
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where 


’ 1 
e = b* ae l+ec dX Pee (Dr + Cx) . 


Hence 


Snm h? 


and the consistency of the nonrandomized test follows by the same argument as 
above, which completes the proof of the theorem. 

9. Asymptotic efficiency. We shall now compare the randomized and the 
nonrandomized tests in terms of Pitman’s concept of asymptotic relative ef- 
ficiency. We shall restrict ourselves to the case of purely discontinuous distribu- 
tions. Under a host of conditions (necessary to insure that the conditions of 
Theorem A are satisfied), it will be shown that the nonrandomized test is asymp- 
totically more efficient than the randomized test, and that its asymptotic ef- 
ficiency does not depend on the choice of the norming factor s,» . The parameter 
6 will be 6*(F, G) = P(X, > Yi) + 3P(X, = Vi) — 3, and hence % = 0. 

Lemma. Let Z; (i = 1,2, --- ,1r) be @(na;, p;), and put Z; = Z; — nag;. 
Then 


, j2 
Sie = Be Lin (9,2), 


II Z; =n" E + > 3s] ap; + o,(n™*?) : 


i=l i=l Api 


i= 


Here the notation f, = 0,(g,) stands, as usual, for f,/gn de 0. The proof of the 


lemma consists of expanding the product []Z; = ]][(Z; + nayp,) and noting that 
Z;/n‘” converges in law (to a normal). 

THEOREM 6. Let m/n = c be fixed, and F be a purely discontinuous distribution. 
Let {Ge,0 S @ S 0} be a family of purely discontinuous distributions having the 
same discontinuities §, as F, with jumps q.(0). Let {Gs} have the following proper- 
ties: 

(1) Go = F; 

(2) (0) > g > 0; 

(3) O*(F, Gs) = 9; 


(4) the convergence (Sim — ESam)/@am —-» 210, b’(0)), given by (8.5), is 


uniform in 6; 

(5) the functions q,.(0) are continuous at @ = 0. 

Let 8am = Snm(U, V) be continuous functions of U and V, having, under (F, Ge), 
finite variances and satisfying the following conditions: 

(6) Samn/n? = > a.(0)0, + > B.(0)V. + y(0)n + 0,(n"”), where 

U; =U, - Npk , Vi - mqi.(9) ; 
(7) ¥°(0) = (c(1 + ¢)/12)(1 — Dov pi): 
(8) y(@) is differentiable, and y'(@) is continuous at 6 = 0; 
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(9) at least one of the 2K inequalities cOa,(@) ~ y(0)a&(@), c0B,.(0) * y(0)Be 
holds, where (arranging the &’s so that & < 41) 
Ge(0) = 1 + cf Di 5x 95(6) + $2(9)], Be = Deion Ds + 4me- 
Under these conditions, the asymptotic relative efficiency of the randomized test 
with respect to the nonrandomized test is 1 — > pi. 
Remarks. (i) Conditions (6) and (9) are satisfied if s,.. is given either by (8.1) or 
by (8.2). For (8.1), for example, using the lemma in this section, we have 


St = 1+ e— DomOlm + en(@)] — X alOl2pr + ea(o)) 


— alps + 2ou(0)) 2 + off), 


and using the Taylor expansion for (a + br)"” we get (6). The same method 
works for (8.2). 


(ii) Condition (7) is necessary, by Theorem 4, to make s,, an admissible 
norming factor. It is satisfied for the choices (8.1) and (8.2) if 


|Sa-Lem | -0, 


60 


which is analogous to the condition go = p» in Theorem 2. 
Proor or THEOREM. In terms of Theorem A, put 


Sis - a. San = Sn — Unm, 


1/2 


Yin) = Tn, Yaa(@) = On, 
c(1l + c) 
12n*y*(6) 

G2n(0) = o(Sam | 4). 
We have then 


ain (8) = o([Sam ~~ Vin(9) Sum | 6) ’ 


12cn 2 n'm 
ee ’ H = . 
(1 + c)(1 — Doe pi) an) Vass 


The verification of (2.1)-(2.3) is routine, and we proceed to verify (2.4). The 
convergences involved are all uniform in 6; except for the one required by condi- 
tion (4), this follows from condition (2) and Theorem B. (All the usual binomial 
convergences, when put in terms of characteristic functions, are seen to be uni- 
form as long as the probability parameters are bounded away from 0 and 1.) 

We have 


Hi(n) = 


Son 7 Von (8) aie Sam ioe ESam 
O2n(8) o(Snm) . 
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which, by (5.9), verifies (2.4) for 7 = 2. Also, we have 


/ 


Sin — Vin (8) aes Sam — Mam — Vin(9) Sam 
7in(O) Tin (8) Sam : 


Arranging the &’s in ascending order of magnitude, we have 
, W. 
s.. => u(E W; + wett), 
k i<k 2 
It follows, by the lemma in this section, that 


Som = Sau 0e + LAPe + mH) + of(n), 


4(0) = Doe pul Dose [ps + 095(0)] + px + cge())}. 


l ow - cb 
2 [Sam > Vin (8) Sum] = x | ao — (6) cal) | U, 


+ x E > as) |? + 0,(n””) i 


(6) 


Because of (9), this expression is asymptotically normal, and it is easily shown 
that (2.4) holds for 7 = 1. Hence, by Theorem A, the asymptotic relative ef- 
ficiency of the randomized test with respect to the nonrandomized test is 


H;(n) a aia ake 3 
lim Fig) 1 oe Ps 


and the theorem is proved. 


10. Acknowledgment. The author wishes to thank Professor E. L. Lehmann 
for his suggestion of the problem and for his helpful interest in the work. 


REFERENCES 


{1] D. van Danrzia, “On the consistency and the power of Wilcoxon’s two-sample test,”’ 
Nederl. Akad. Wetensch., Proc., Vol. 54 (1951), pp. 1-8. 

[2] W. J. Drxon anv F. J. Massey, Jr., An introduction to statistical analysis, McGraw- 
Hill Book Co., 1951, p. 248. 

[3] W. J. Dixon anv A, M. Moon, ‘The statistical sign test,’’ J. Amer. Stat. Assn., Vol. 
41 (1946), pp. 557-566. 

[4] J. Heme risk, ‘“‘A theorem on the sign test when ties are present,’’ Nederl. Akad. 
Wetensch., Proc., Vol. 55 (1952), pp. 322-326. 

[5] J. Hemevrisx, ‘Note on Wilcoxon’s two sample test when ties are present,’’ Ann. Math. 
Stat., Vol. 23 (1952), pp. 133-135. 

[6] W. Hoerrpinea, “A class of statistics with asymptotically normal distributions,” 
Ann. Math. Stat., Vol. 19 (1948), pp. 293-325. 





386 JOSEPH PUTTER 


[7] W. H. Krusxat, “A nonparametric test for the several-sample problem,’’ Ann. Math. 
Stat., Vol. 23 (1952), pp. 525-540. 

[8] W. H. Kruskau anv W. A. Wat is, “Use of ranks in one-criterion variance analysis,” 
J. Amer. Stat. Assn., Vol. 47 (1952), pp. 583-621. 

(9] E. L. Leumann, ‘‘Consistency and unbiasedness of certain nonparametric tests,’’ 
Ann. Math. Stat., Vol. 22 (1951), pp. 165-179. 

(10] H. B. Mann anv D. R. Wurtney, “‘On a test of whether one of two random variables is 
stochastically larger than the other,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 50-60. 

[11] G. E. Noreruer, ‘Asymptotic properties of the Wald-Wolfowitz test of randomness,”’ 
Ann. Math. Stat., Vol. 21 (1950), pp. 231-246. 

[12] E. Parzen, ‘‘On uniform convergence of families of sequences of random variables,”’ 
Univ. California Publ. Stat., Vol. 2 (1954), pp. 23-54. 

[13] Probability Tables for the Wilcoxon Test When There are Ties, National Bureau of Stand- 
ards Report No. 1859, Washington, D. C. 





ON A CLASS OF DECISION PROCEDURES FOR RANKING 
MEANS OF NORMAL POPULATIONS! 


By K. C. Sean 
University of North Carolina 


Summary. An infinite class of decision rules having several desirable proper- 
ties is suggested for choosing a group of populations from a given set of normal 
populations which should contain the population with the largest mean. The 
problem of selecting one member from this infinite class of rules has also been 
studied. 


1. Introduction. In recent years it has been recognized ({1], [3], [4], [5], [9], 
[10], [11]) that the conventional test of homogeneity, such as the F-test in the 
analysis of variance for testing the equality of several population means, does 
not supply all the information that the experimenter seeks. In many practical 
situations it is unrealistic to assume that the population means of several essen- 
tially different populations will be equal. A sufficiently large sample will thus 
enable the experimenter to detect this difference at any preassigned level. In 
most cases what the experimenter actually wants is a decision procedure which 
would tell him which population or populations possess a desired characteristic. 
For example, the experimenter may be interested in determining the population 
with the largest mean, from a set of normal populations. Alternatively he may 
desire to select from a given number of populations a group containing the popu- 
lation having the largest mean. 

Suppose there are n + 1 normal populations N(u;, 01), i = 0, 1, 2, --- , n, 
with unknown means and a common but unknown variance and that k random 
observations Zia (¢ = 0,1, --- ,m; a = 1,2, --- ,k) from each of the n + 1 
normal populations are given, where z;_ is one of the k observations from the 
ith population. Under our assumptions the n + 1 sample means 


k 
ti = do tia /k 
ant 


will obey N(u;, 01 / k), i = 0, 1, --- , n, and an estimate 


n k 
si = 2D (ia — 2) /(k — 1)(n + 1) 
of oj can be obtained which is independent of the sample means x; , i = 0, 1, 
- , n. We may, therefore, assume for mathematical convenience that just one 
random observation z; from each of n + 1 normal populations N(u;, 0’), i = 
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0, 1, --- , n, is given, where o” = o; / kis estimated by 
n ~ 
s=ai/k = 2 , ia — 2i)* / (kk — 1) + 0) 


and this is assumed to be known. Clearly this estimate s’ of o° is stochastically 
independent of the given observations x; ,i = 0, 1, --- , n. It is desired to choose 
a group of populations from the above n + 1 populations, with the help of some 
decision rule which ensures that the least upper bound of the probability of not 
including in the group the population with the largest mean isa (0 < a < 1), 
whatever may be the unknown y,’s. Subject to this fundamental requirement we 
would like the rule to possess other desirable properties such as: 

(a) The property of unbiasedness, i.e., the probability of rejecting any popula- 
tion not having the largest mean is not less than the probability of rejecting the 
population having the largest mean. (Analogy of this property to the property of 
unbiasedness in the theory of testing of hypothesis should be noticed.) 

(b) The property of gradation, i.e., corresponding to any a (0 < a < 1), 
there exists a constant uo. such that the chance of retaining the population with 
mean po in the group is greater or less than a, according as uo is greater or less 
than yo. The constant uo. will in general depend on the decision rule as well as 
the unknown means of the remaining n populations, and the common variance a’. 

An infinite class © of decision rules satisfying the fundamental requirement, 
together with the properties (a) and (b) is given in Section 2.1. Certain interesting 
properties of this class are studied in Section 3. The question of choosing one 
member from this infinite class having further desirable properties has been 
studied in Section 4. 


2. Class @ of decision procedures. 

2.1. Let y:, 7 = 0,1, --- ,n, be n + 1 random observations from N(0, o’) 
and let ya) < ye) < ++: < Yq) be mn ranked observations among yi, --- , Yn- 
The y’s will then define another set of random variables Y;, ,7 = 1, --- , n. 
It is assumed y; ~ y; ,7 ¥ j, since the set of points (yo , yi, --- , Yn) in (n + 1)- 
dimensional Euclidean space where y; ¥ y;, i * j will be obtained with prob- 
ability 1. Let ta(e:,---,¢n) (c; 20,4 =1,---,n, Do? ec; = 1) denote the 
upper 100 a % point in the probability density function (pdf) of 


(2.1.1) tc, +++, en) = sd ota — Fe 


The class € of decision rules D(c,, --- ,¢n) (ce; = 0,4 = 1,--- ,n, De = 1) 
is defined as follows: 
“Reject any observation x» from the given observations z;,7 = 0,1, --- , n, 
if 
(2.1.2) i CX — Xo > sta(er , Po » Ca) 
i=l 


and accept otherwise, where rq) < x) < --++ < 2) are n ordered observations 
among (%; , 2, ++ , Xn). (The n + 1 observations z;, 7 = 0,1, --- , n, taken 
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from normal populations are again assumed to be distinct.) Proceed as above for 
each of n + 1 observations separately, so that each of n + 1 observations in 
due turn takes the place of x» and the remaining ordered observations play the 
part of za), --- , 2m.” Thus in the above procedure we may start with the 
largest observation among 2;, 7 = 0, 1, --- , m, as x and work downwards. If 
any particular observation is rejected, all other observations smaller than this 
observation are automatically rejected. 

For the sake of convenience, we shall denote the decision rule D(c; , --+ , ¢n) 
when (i) c; = 1/n, i = 1, --- ,n, by D and when (ii) c, = 1, and c; = 0, 
j #r by D(r), 1 S r Sn. The corresponding auxiliary statistics t(c, , --+ , Cn) 
will be denoted by Z and ¢(r). 

It may also be noted here that > CXainp— % ( 20, t= 1,---,n, 
>a c; = 1) can be written in the alternative form >: Ci(Xi) — Xo). 


3. Some properties of class C. 

3.1. An inequality related to location parameters. 

THEOREM 3.1.1. Suppose that F((a. — m) / 01, --°* , (@n — un) / on) t8 the 
cumulative distribution function (cdf) of n random variables X;, 71 = 1, +++ ,n, 
and T(u,, +++ , Un) ts a real-valued function of u;,i = 1, --- , n, such that 


(3.1.1) T (uy + a, °°? 5 Ge a Qn) = T (uw op: ee Un), 


where (a, , +++ , @) 18 a set of real numbers and —~ <uj< w,i = 1, +++ n. 
If for an arbitrary constant k, 


OF, 10.2° 5 Ge 


P| T(Xs,-++ 4X.) > k mae ay 


denotes the probability of T(X1,--- ,Xn) > k when X,,--- , X, have the cdf 
F ((ay = m1) / a1, Per » (Ze a Hn) / On), then 


P| TX, ++, Xd > Here Ree 


Mi, *** 5 Sea 


| 
> P| TX, ++, Xo >|” ib 


01, °** yon 


PROoF. 


Jan ton, +++ te + ae 
Prox, .x9 > 4" PE AREF 


a1 § 22 %% On 


gies 


= PY TC anya a0 > k\ 


71; 


‘My ty Mn 
2 LPs X > BI ; |, 
| O15, °° On 


since T(X; + a,°+:,Xn+ an) 2 T(X1, -++ , X,) by hypothesis. Q.E.D. 
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From this theorem the following corollaries readily follow: 
Coro.uary 3.1.1. If (3.1.1) is satisfied for all 
a 20,i=1,---,n, then P[T(X,--- ,X,) > kl] 


is a nondecreasing function of each u;,i = 1, --- ,n. 
Coro.iary 3.1.2. If 


T(u + a, +++, Un tan) > T(m, +++ , Un), 
when a; = 0 and a; > 0 for at least one i, 1 S i S n, and if the cdf of 
TE, *+ , ZS 
assigns a positive measure to every nondegenerate interval, then 
P[T(X1, --- , Xn) > &l, 


where k is an arbitrary constant, is an increasing function of each yw; ,i = 1, --- , n. 
Coro.uary 3.1.3. Any strictly monotonic functional of the cdf of 


T(X1, wi » Xa), 


which satisfies the conditions of Corollary 3.1.2, is an increasing function of u; , 
*#=1,---,n. 


EXampLeE 1. Consider the pdf 


n 
Bi Za =~ saat seal enn i : 
s(2=# eye to) = TT (os Vda) meen 





ol On 
and T(X:,-°:,Xna) = (DP cXw)”™, where r = 0,1,2,--- and c; = 0, 
t= 1,---,n, and c; > 0 for at least one 7. 


Here the conditions of Corollary 3.1.2 are easily verified and it follows that 


n 2r+1 
P| (36: Xw) >t], r= 0,1,2,-:-, 
t=1 


is an increasing function of each y;,7 = 1, --- , n. This result for the particular 
case r = 0 will be used in Section 3 in proving the properties of unbiasedness and 
gradation for the class € of decision rules as defined in Section 2.1. 

It is well known that if F(x) is the cdf of a random variable X, then expecta- 
tion E(X), if it exists, is a strictly monotonic functional of F (cf. [6], p. 152-153; 
[12], p. 189). Hence we get from Corollary 3.1.3 that 


n 2r+1 
HX « Xo) , r=0,1,2,---, 
i=l 


is an increasing function of each of u;, 7 = 1, --- , n. Much more complicated 
functions can be constructed (cf. [13], pp. 25-26) having a similar property. 

3.2. Property of unbiasedness. Let Q(u, , --- , un ; 7) denote the set of normal 
populations N(u;, 0°), i = 1,2, --- , n. Suppose that Xq) < --- < Xq) are 
n order statistics from Q(u; , --- , un ; 7) When one random observation from each 
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of these n normal populations with common variance equal to o’ is taken. Let 
Xo be another independent variate obeying N(uo , 0°). According to our decision 
rule D(c, , --- , ¢,) as defined in Section 2.1, the probability of rejecting 2 will 
then be given by 


PLY 6 Xe = Xo > ste (6, ++ 60) | 


i=] 
- | ds | dz | & 6:0 | viernes 
0 Ent 4 


-g(zw ; se * 5 Zin) | wm, ~ » Mn) dxa +++ dX~m) , 


where 


=O < fm < °** < Ze < @ 
3.2.2 A=I|< : 
( ) 2 6B) > Bo + Shaler» +++ » Cn) 


g(ta),°** , 2m) | M1, °°" , Mn) represents the pdf of Xq,---,X qm from 
Q(u1, -** , un 3 7) and 


v/2 

° v eo eo 
(3.2.3) p(s) = 20-DR P(y/Der ° /2@ 8 1 ; 
i.e., the pdf of sample standard deviation s based on vy = (k — 1)(n + 1) (ef. 
Section 1) degrees of freedom. 

THEOREM 3.2.1. P{(>? eX — Xo > stal(er, «++ , Cn)] ts an increasing func- 
tion of each pw; = wi — wo, i = 1, °°: , 0. 

Proor. Let X; = X; — wo, 7 = 0, 1, 2, --- , n. Since Sia = 1, 


p> 6: Xw ae Xo > sta(cr , Te 1) | 
i=1 


(3.2.4) : 
ws Py ¢; X(y > Xo + Staler, ++: «a 


For fixed values of Xo and s, the conditional value of 


p> ¢; Xt > Xo + Staley , wrt a) | 
v1 


is an increasing function of wu; = 4; — uo, by Corollary 3.1.2 and Example 1 
of Section 3.1. Since the distribution of Xo and s does not involve the yu; , it is 
now obvious that the (unconditional) value of 


Py ce; Xw —- Xo > sta (cy 5. +See ea) | 
i=l 


is an increasing function of each yu; . Q.E.D. 
From this theorem an interesting property of D(c, , --+ , cn) follows. 
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CoroLLary 3.2.1. The probability of rejecting any undesirable population (i.e., 
any population which has not the largest mean) is never less than the probability 
of rejecting the desirable population (i.e., that population having the largest mean). 

Proor. For the present proof let uo) = --- = um) denote the mean of the 
given n + 1 normal populations with common variance o”. By our decision rule 
D(c:, +++ , ¢n) the probability of rejecting the desirable population N (ue , 0’) 
will depend on 


(3.2.5) Prey,-++,en(y — HO) s *** » Hin) — HO), 
which is defined as the conditional probability of 


2» C; Yi > Yo + stale, es Cn); 


when y and s are assumed to be held constant. Here Y,. , i = 1, --- , n, and 
Yo are defined as in Section 2.1. The probability of rejecting any undesirable 
population NV (uc , 0°), 7 = 1, 2, --+ , n, will, on the other hand, involve 
G6) Pro .+++ sen (BO — Ba) » BQ) ~~ Bi)» °° * » B12) — Bd) » B41) ~~ BD *** 
Min) — Mi). 
Comparing the arguments of P.,,...,., in (3.2.5) and (3.2.6) we notice that 


Kan — Wa) 2 MD) — BO » Bo — May = Hii) —~ B® > 


where 7 = 1, 2, --- ,n,j = 1,2, --- ,n; 7 #7. Thus we can make a one to 
one correspondence between the n arguments of P,,,...,., in (3.2.5) and (3.2.6) in 
such a way that no argument of P,,....,., in (3.2.6) is less than the corresponding 
argument of P.,,...,., in (3.2.5). Hence from the monotonic behavior of P,,,...,, 
(6:, -+- , 6,) with regard to 6;,7 = 1, --+ , n, it follows that the probability of 
rejecting any undesirable population N(u) , o°), i = 1, --- , , is never less 
than the probability of rejecting the desirable population N(u) , 0’). This 
property may be denoted by the property of unbiasedness which is therefore pos- 
sessed by our decision rules D(c,,--- ,¢n) (ec; 2 0, >t c; = 1). It may also 
be noted that all the arguments of P.,,...,.. in (3.2.5) are nonpositive and so 
(3.2.5) will not exceed P,,,...,.,(0, --- , 0). This implies that the probability of 
rejecting the desirable population N(x , o°) will not exceed the desired signifi- 
cance level a (0 < a < 1). Hence a will be the least upper bound of the 
probability of incorrect choice (i.e., not including the population with the largest 
mean in the selected group), whatever may be the population means. Thus any 
rule D(c; , --+ , ¢n) satisfies the fundamental requirement as stated in Section 1. 
3.3. Property of gradation. From Theorem 3.2.1 it follows that 


(3.3.1) p> e¢; Xi) — Xo > stalar, +++, c) | 


t=1 


is a decreasing function of uo ; when yo — — ©, the value of (3.3.1) is equal to 1 
and when uo — +, the same value is equal to 0. It is easily seen from (3.2.1) 
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that (3.3.1) is a continuous function of uo . Hence corresponding to any assigned 
value y (0 < y < 1) of (3.3.1) there exists a particular value yo, of uo for 
which (3.3.1) is exactly equal to y. The value yo, will clearly in general depend 
upon m1, *** , Hn, o@ and cq, --- , c, besides the assigned value y, and if 


Mi, M2, °** » Mn 


increase by a given constant A, then yuo, will also be increased by the same con- 
stant. In this situation we shall, therefore, find that 


(3.3.2) PL e:; Xi — Xo > stalar, ++ 1) | S % 


t==l 


according as mw = vo, - This property will be designated as the property of 


gradation. We shall now study the nature of the unknown constant yo. when 7 
is taken to be equal toa (0 < a < 1). It will be shown that yo, for the decision 
rule D is very simple in form, but for other decision rules of class € no such simple 
explicit expression for uo can be given. Let X, = >of Xqw / n. Then X, will obey 
N(DT ui /n, 0 /n). Let Ya) < +--+ < Ym) be norder statistics derived from a 
random sample of size n from N(0, o”). Also let Yo = Xo — uo, so that Yo obeys 
N(0, o”). Clearly the distribution of X, — }°? u;/n is identical with that of 
Y,= DI Yu /n. . 

Under our decision rule D the probability of rejecting x in a single rejection is 
equal to 


P [X, — Xo > sla] = P|, — Yo) + ( m/n— 1) > cin 


= Pir, Ti Yo > Stal = da, 


according as yo = >=? u:/n. Thus for D we have the special property that the 
probability of rejecting any population whose mean is greater than the average 
of the remaining n population means is less than a and the probability of reject- 
ing any population whose mean is not greater than the average of the remaining 
n population means is at least equal to a. 

It is now shown that uo. for the general decision rule D(c; , --- , c,) is not in 
general equal to E( >: c;X (i), although we have just shown that this is true for 
D. 

The existence of woe (which is a function of wi, +--+ , un, 73 C1, *** 5 Cn bEe- 
sides a) for which the property of gradation holds for the general decision rule 
has already been shown. This implies that 


(3.3.3) a= p> c; Xi — Xo > stala,---, c) | ; 


i=l 


when E(Xo) = woa(ui, -** » Mn, 0301, °** » Cn, @). It is shown that the as- 
sumption po. = E(>t c:X i») (which implies that yo. is independent of a) 
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leads to a contradiction for the general case. The right-hand side of (3.3.3) can 
be written as 


(3.3.4) b> c; Xi — boa — (Xo — boa) > Staltr, --- <a). 


But Yo = Xo — uoa obeys N(0, a”); hence by (3.3.3), (3.3.4), and the definition 
of ta(ci, -+- , Cn) we get 


p> Ci Vw —- Yy> sta(cr, dish ci c) |= a 
i=l 


(3.3.5) 


t=1 


= PX «Xe ~~ i. = Yo> Sta(cr, -*- ve), 


where Yq) < --- < Y a) are n order statistics obtained from a sample of size 
n from N(0, o”). Hence it follows that, when yo. is assumed to be independent of 
a, the distribution of }°? ¢:X() — woe and >"? c;¥;) must be identical. As a 
necessary condition for this we then have 


AL «¥eo) = AL CX (i aa He) 
= Ad Xo) — Moa. 


Hence if we assume that the unknown constant yo. is E( >>? eX), then it will 
follow that 


(3.3.7) ZO» Yo) = 0, 


for an arbitrary set of c;’s such that c; = 0 and >>? c; = 1. The equation (3.3.7) 
does not, however, hold in general and hence we arrive at the conclusion that 
Hoa is not in general equal to E( >>? c:X,)). We can, however, easily derive the 
value of woo for Dic, , --+ , Cn) when wi = we = --- = wn. It can be easily 
shown (cf. [13], p. 70) that in such a situation uo. must also be equal to mw . 


(3.3.6) 


4. Selection of an optimum rule. 

4.1. In this section we shall assume that the number of degrees of freedom 
(k — 1)(n + 1) of s (ef. Section 1) is so large that « may be considered to be 
known. Under this restriction the rule D(c¢, , --- , cn) as described in Section 2.1 
requires the obvious modification that s should be replaced throughout by the 
population standard deviation o. 

It has been shown that the class C of decision rules satisfies the fundamental 
requirement, i.e., the least upper bound of the probability of rejecting the popula- 
tion having the largest mean from the selected group isa (0 < a < 1), what- 
ever may be the means of n + 1 given normal populations. If among the n + 1 
population means all means except one are equal, then obviously it would be 
desirable to select that rule from the class © which 
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(i) maximizes the probability of retaining in the selected group the population 
with the unequal mean if this is larger than the common mean of the other n 
populations; and 

(ii) maximizes the probability of not retaining the population with the un- 
equal mean if this is smaller than the mean of the other n populations. 

In case (i) the population with the largest mean will be designated as the 
“best” population, and in case (ii) the population with the smallest mean will be 
called the “worst” population. Thus if Xq) < --- < X ) are assumed to have 
come from N(0, o”) and X> from N(8, o”), then our desirable rule should ensure 
largest probability (i) for retaining 2» in this selected group if 0 < 6 < ©, or, 
(ii) for rejecting x9 from the group if —- © < 6 < 0. From what we have observed 
in Section 3.2 it is clear that the above rule will be optimum when Xq) < --- < 
Xn) are assumed to arise from N(u, 0”) and Xo from N(u + 8, 0°), —-~ <u 
<@. 

We shall now show that among the class € of decision rules the rule D maxi- 
mizes (approximately) the probability of retaining the “best’’ population in the 
selected group. In an exactly analogous way it can be shown that D maximizes 
also the probability of rejecting the “worst” population from the group. To 
derive this result we shall first prove the following: 

Lemma 4.1.1. Let Ya) < --- < Yq) be n order statistics from N(0, 1). Then 
ei Yio/n = SST Y;/n has minimum variance among all >“? ci¥ (x) such that 

1 C = |, 

Proor. We have 


n n 


(4.1.1) Var (> e¥co) = > vo Ci; 053 , 
ant in « 


i=l j= 


where v;; denotes the covariance between Y,, and Y,, . Let the variance-co- 
variance matrix of Y;y and Yin (¢ = 1, --- ,n;7 = 1, --- , mn) be denoted by 
Z(n X n). 

To minimize (4.1.1) subject to the condition 


(4.1.2) > c; = 1, 


we get the following n equations 


(4.1.3) > Cli = A, 


jaa 


where 2) is used as Lagrangian multiplier. In matrix notation equations (4.1.3) 
can be written as 


(4.1.3a) xc = Al, 
wherec’(1 X n) and 1’(1 X n) denote the row vectors (c; , --- , ¢,) and 


(1, i, sa , 1) 
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respectively. Since = is nonsingular, we get from (4.1.3a) 
(4.1.4) ¢ = \D1. 
But it is known ({7], [8]) that 


(4.1.5) > v4; = 1. 


j=l 
Hence 21 = 1; this implies 
(4.1.6) = = 1. 


By (4.1.2), (4.1.4), and (4.1.6) it follows that c; = 1/n,i = 1, --+ ,n. 

This completes the proof of the lemma. 

The probability of retaining 2 arising from the “best” population when 
D(c, , --* , €n) is followed will clearly be given by 


(4.1.7) (Qn Bayern as i= | exp | ~@e = 8)” / 20° ce a rw /26*| I] dx. ’ 
where 

—2 < fy < °° Lay < © 

» CA) — Xo < otalti,-** , Cn) 

t=—1 


—-o7 <%Hy< wo 


Our object is to show that the expression (4.1.7) is (approximately) maximum 
for D. The arguments given in [13], pp. 71-84 and [14] suggest that 


n 
u(c,, P+ 8 Cn) = » iY w = Yo, 
= 


where the c,’s, Y;»’s and Y» have the same meaning as in Section 1, may be as- 
sumed to be normally distributed for all practical purposes whatever may be the 
value of n. Let the (approximate) normal distribution of u(c:, --- , cn) be de- 
noted by N(é, , 02). Henceforth we shall consider this distribution to be exactly 
normal and hence the result derived below is correct only approximately. For 
the special case when all c,’s are equal, i.e., c; = 1/n,7 = 1, --+ , n, we shall 
write @ for u(1/n, --- , 1/n) and N(€, é’) for the (exact) distribution of a. By 
Lemma 4.1.1 we know that ¢ is the minimum among all o, , where >“? ¢; = 1. 

In the given situation }\? ¢:X<) — Xo +6 will have the (approximate) 
normal distribution N(é , 02), where mean £, and variance o; are independent of 
6. Hence 


(4.1.8) v(cr,*** 5 en) = Li Xi — Xo + 5 — & 


oc 


will have standard normal distribution N (0, 1). 
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Hence the expression in (4.1.7) can be written as 


—v2/2 


i [ota (cy,+++ en)—Ect+S] /oc 
e 


(4.1.9) (2r)~*”? dv. 


—O 


Also from the definition of ta(c:, «++ , Ca) (ef. Section 1) it is now evident that 


(4.1.10) (2x)7? [ eo dy = a. 


(ota (cy,*** en) —kcl/oc 


From (4.1.10) it follows that 


(4.1.11) ota(cr1, °°: » Ga) = _ ole we. 


Cc Gg 


From (4.1.9) it is easily seen that the probability of retaining 2» in the selected 
group under the present situation is an increasing function of é—a result which 
is a particular case of Corollary 3.1.2. Now for any arbitrary 6 > 0 the term 
in (4.1.9) will be maximum (when the c¢,’s are varied subject to the conditions 
c; = 0, ol c; = 1) when 


(4.1.12) Tarr s+ Cm) — Se +8 _ ohelr, + or) — fe 4 o 


Cc Oc Oc 


is maximum. But ¢, 2 ¢ (for all c,;’s subject to the above restrictions) implies 
that 6/¢ = 6/o, for any 6 > 0. Hence by (4.1.11) and (4.1.12) it follows that 
(4.1.12) is maximum for D. Thus the rule D may be taken as the optimum rule. 

It is interesting to note the close similarity of this optimum rule D to the usual 
(Student’s) ¢-statistic for which a desirable property has been recently derived 
by Bahadur [2] while studying two normal populations with a common variance. 
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ORDERED FAMILIES OF DISTRIBUTIONS! 


By E. L. LEHMANN 
University of California, Berkeley 


1. Summary and introduction. A comparison is made of several definitions of 
ordered sets of distributions, some of which were introduced earlier by the 
author [7], [8] and by Rubin [10]. These definitions attempt to make precise the 
intuitive notion that large values of the parameter which labels the distributions 
go together with large values of the random variables themselves. Of the various 
definitions discussed the combination of two, (B) and (C) of Section 2, appears 
to be statistically most meaningful. In Section 3 it is shown that this ordering 
implies monotonicity for the power function of sequential probability ratio 
tests. In Section 4 the results are applied to obtaining tests that give a certain 
guaranteed power with a minimum number of observations. Finally, in Section 
5, certain consequences are derived regarding the comparability of experiments 
in the sense of Blackwell [1]. 


2. Some definitions of order. Let X = (X,,---, X,) be a random vector 
with probability distribution P, , depending on a real parameter 6. In the prob- 
lems occurring in applications such distributions are usually ordered in the sense, 
roughly speaking, that large values of 6 lead on the whole to large values of the 
X’s. This intuitive notion can be given a precise mathematical meaning in 


various ways, some of which we shall now describe. 

(A’) For any 6 < @ there exists a vector-valued function f = (fi, --- , fa), 
depending in general on @ and @’, such that’ 

(i) x S f(a), 

(ii) if X has distribution P, , then the distribution of (f,(X), --- , f.(X)) is 
Py . 

This condition, which was used by the author in [7] and [8], states that one 
can pass from a random vector with distribution P, to one with distribution 
P.» by a transformation which increases all of the components of the vector. 
An example is the case of a location parameter @ where one can then put 

fiz) =z; +0 — 8. 

For technical reasons the following slightly weaker condition, which was given 
in [8], is sometimes more convenient. 

(A) There exists a random vector Z and functions g = (g:,-+:, 9a), 9° = 
(91, *** , 9») such that 

(i) g(z) S g’(z) for all z, 

(ii) the distributions of g(Z) and g’(Z) are P, and P,, respectively. 


Received September 14, 1954. 


1 This paper was prepared with the partial support of the Office of Naval Research. 
2 Here, as throughout, an inequality between two vectors means that this inequality 
holds for all the components. 
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A function @ defined on an n-dimensional euclidean space is said to be in- 
creasing if x < 2’ implies ¢(x) = ¢(z’); a set S is said to be increasing if its 
characteristic set function is, that is, if x ¢ S,x S< 2x’ implies 2’ « S. 

Condition (A’) is the special case in which Z = X, g is the identity function, 
and g’ = f. Condition (A) clearly implies: 

(B) If @ < @’, then for every increasing set* S 


(2.1) P.(S) S Pe-(S), 


and also the seemingly stronger 
(B’) If 6 < 6’, then for every increasing function’ (2 , --- , Zn) 


(2.2) E,o(X) S Eyd(X). 


Actually, (B) and (B’) are equivalent. To see this, assume without loss of gen- 
erality that ¢ is non-negative, and consider the approximation of @ by a sequence 
of nondecreasing simple functions 


x) 5 for xe Si” 
o,\r) = - 


n for ze Si” 


where 


se” = \z: =. < ¢(zx) < st, 


Qn 
SY’ = {x:¢(z) > n} 
Then it is seen that Z,¢,(X) can be written in the form 


- 
a a; P,(S{” + S$ + ae + SY”) 
where the a; are = 0. But each set S{” + --- + S\” is increasing, and it follows 
from condition (B) that E,¢,(X) < Eyd,n(X) and hence E,¢(X) S Eo(X). 
A somewhat different condition supposes that all of the distributions Py», 
possess probability densities with respect to a common o-finite measure uy. 
(C) If 6 < 6’, the probability ratio 


(2.3) Pe (zx) 


a e po(x) 
is increasing. 


3 Throughout, we restrict consideration to sets and functions which are Borel measur- 
able. 

4 Probability densities being defined only up to sets of measure zero, condition (C) and 
similar conditions to be considered later, for example in connection with Theorem 3, should 
be interpreted to mean that there exist versions of these densities satisfying the condition 
in question. Furthermore, the condition is not meant to carry any implication as regards the 
points z at which both densities vanish. 
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Slightly more generally it is enough to assume the existence of real-valued 
functions 4 , --+ , & such that 


po (x) re fo (ti(x), +++ , te(x)) 
pox) — folti(x), «+ , te(x)) ” 


Then &4(x), ++: , &(#) are sufficient statistics, and without loss of generality 
So(t:, --- , &) may be taken to be the generalized probability density of 7 = 
(t:(X), ---+ , t(X)). Condition (C) is therefore essentially a generalization of one 
investigated by H. Rubin [10] to the effect that the ratio (2.3) is a monotone 
function of a real-valued statistic. We note the obvious lemma: 

Lemma 1. [f for each x the density p(x) is a differentiable function of 6, then a 
necessary and sufficient condition for (C) to hold is that a / 6(log pe(x)) be nonde- 
creasing. 

It was pointed out above that (A) implies (B). The following examples show 
that (A) and (B) are not equivalent, and that in general (C) is not directly 
comparable to (A) or (B). 

The situation is summarized in Table I in which the sign + or — indicates 
that the condition in question does or does not hold. 


TABLE I 


S 





Evidently possible 
Example 2.1 


Impossible since (A) implies (B) 
Example 2.2 

Example 2.3 

Example 2.4 

Evidently possible 


i+ 1 + eed e 


EXAMPLE 2.1. Let X be a random variable having a Cauchy distribution, with 
density 


1 1 


w(t) = ite 


Then if 6 < 6’, the transformation f(x) = x + (6 — 6) shows that (A) holds, 
and hence also (B). On the other hand, the ratio pe (x) / po(x) ~ last >to, 
and hence obviously is not monotone. 

EXAMPLE 2.2. Let n = 2, and let the probability be concentrated on the four 
squares A, --- , D indicated in Fig. 1a. The conditional distribution over each of 
the four squares is assumed uniform under both 6 and 6’. The probabilities of the 
squares are given in Table IT. 
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It is easily checked that (B) holds. Also 


oo Py (x1, 2) 
r(x, 22) = Win an 


is larger in A than in either B, C, or D, so that (C) is satisfied. On the other 
hand, if there existed vectors g(Z) and g’(Z) with distributions Py, and Py , and 
such that g(z) < g’(z) for all z, then g’(z) e C would imply g(z) ¢ C, and hence 


P»(C) = Pe(C). Thus (A) does not hold. 


Z2 


Fig. 1b 


TABLE II 


Pe Pe 


3/16 12/16 
6/16 1/16 
1/16 2/16 
6/16 1/16 


Here the parameter @ takes on only two values. We obtain an example in 
which @ ranges over a continuum by means of the following lemma. 
Lemma 2. Let Py and P, be two probability distributions, and let 


P, = OP; + (1 — 6)Po, 0<6<1. 


Then each of the conditions (A), (B), and (C) holds for allO <= 6 < & S 1if and 
only if it holds for the pair 0 = 0, & = 1. 
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Proor. A direct calculation shows that if (B) or (C) holds for the pair @ = 0, 
#’ = 1, it holds for all 6 < 6’. To prove this for (A), suppose that f;(Z) has dis- 
tribution P; (¢ = 0, 1) and that fo(z) s fi(z) for all z. Consider a random variable 
U, uniformly distributed on [0, 1], and let 

wy r< 
x oneal toe 

fi(Z) if @<U 

; ion 
ies wcities tet ie 
ln(Z) if 0 <U. 
Then X, and X» have distributions P, and Py respectively, and g(u, z) Ss 
g'(u, z) for all u and z. 

The required example is now obtained by taking for Py and P, the probabilities 
denoted in the example by P, and Py» , and by defining P, as in the lemma. 
This remark applies also to the examples that follow. 

EXAMPLE 2.3. In Fig. la of Example 2.2 replace the square A by two squares 
A, , Az as indicated in Fig. 1b. Let the probabilities P)(A) = #, and P»(A) = 
+% be divided among A, and A; so that 


P,(Ai) = Ys, P,(A2) = 1s; Py(Ai) = Ys; Py (A) Be Ys: 


Then as before (A) does not hold and (B) does. However, (C) now also does not 
hold since the ratio r(x; , x2) has the value 2 in region C but only the value 3 
in region A; . 

EXAMPLE 2.4. The (x; , Z2)-plane is divided into 6 parts A; , Az, B,, Be, Ci, 
C, as indicated in Fig. 2. The probability ratio and the probabilities under 6 and 
6’ of the six sets are given in Table III. 


A; | Az 
Bs |B 
C; | C2 


+ 


Fig. 2 
TABLE III 





| 
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It is seen that (C) holds. On the other hand P(A, + B, + C.) = 
Py(Az + Bz + C2) = .24, so that (B) and hence (A) is not satisfied. 

The rather chaotic state of things indicated by these examples is replaced by 
a much simpler one if the components of the vector X are independent, though 
not necessarily identically distributed. 

THEOREM 1. Jf Xi, --- , Xn are independent, then 


(C) — (B) = (A). 


Proor. Consider first the case n = 1. Suppose that (B) holds, and let Fy and 
Fy, denote the cumulative distribution functions of the distributions /’, and 
Py respectively. If g(z) = Fe'(z), g’(z) = Fe’ (z), and Z is uniformly distributed 
on [0, 1], then g(Z) and g’(Z) have the distributions Ff, and Fy» respectively. 
That g(z) = g’(z) follows from the fact that F»(x) s Fe(x) for all x since (B) 
is assumed to hold. 

To show that (C) implies (B) whenn = 1, let r(x) = po (x) / pe(x). Given any 
constant k there exists a number p between 0 and 1 such that 


(2.4) Po{X > k} = Po{r(X) > r(k)} + pPe{r(X) = r(k)}. 
It is then easily seen that (2.4) holds, with the same p, also when @ is replaced by 


6’. Consider now the problem of testing @ against 6’, at the level of significance 
a, which is the value of the probability (2.4). Then the critical function, given by 


(1 if r(x) > r(k) 
o(z) = | 


lo if r(x) = r(k) 


has size a, and is the most powerful level @ test for testing @ against 6’. It follows 
by comparison with the test ¢(x) = a that 


Po{r(X) > r(k)} + pPo{r(X) = r(k)} S Po {r(X) > r(k)} + pPe{r(X) = r(k)} 
and hence that for each k, 
Po{X > k} S Py {X > k}. 


The same relation for X 2 k follows by a limiting argument. 
Suppose now that n > 1 and that (B) holds. Then in particular 


(2.5) Pi{X; > k} S Po {X; > k} for all k 


and it follows from the case n = 1 that (A) is satisfied. 

Finally let n > 1, pe(ai, «++ , tn) = f6 (a1) «++ f$” (an), and assume (C) to be 
satisfied. Then for each i, f5’(x;) / 6°(x,) is nondecreasing in 2; as is seen by 
holding the other coordinates fixed. It follows from the case n = 1 that (2.5) 
holds, and the proof is complete. 

We shall in the present paper be mainly concerned with families of distribu- 
tions that are ordered in the sense that both conditions (B) and (C) hold. It is 
a consequence of Theorem 1 that this is the case in particular if X,, --- , X, is 
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a sample from a univariate distribution with density f,(x) where for @ < @ the 
ratio fe (x) / fe(a) is nondecreasing in z. 


3. Monotonicity of the power function of some sequential tests. As a first 
application we consider the problem of testing sequentially the hypothesis 
6 S 4 against the alternatives @ > 6, , where 0 < 6; . Wald proposes as a solu- 
tion the sequential probability ratio test, according to which observations are 
taken as long as 


¢ - Po, (xi) 

(3.1) a< 2d log alias < b. 

At the first violation of (3.1) the hypothesis is accepted or rejected according as 
the probability ratio is then Sa or 2b. 

Wald mentions ([12], p. 73) that in many important special cases the power 
function 8(@) of this test is an increasing function of 6. If a and b are adjusted so 
that 8(4) = a and 8(6,) = 8, this then implies that 8(@) < a for @ S 6, and 
8(0) = 8 for 6 = 6,, and hence satisfactory control of the probabilities of both 
kinds of error. The following result establishes such monotonicity for a large 
class of problems. The test treated is the generalized probability ratio test, where 
in (3.1) the constant boundaries a, b are replaced by variable boundaries, say 
a» and b,, , and where some of the strong or weak inequality signs defining the 
test may be replaced by weak or strong ones respectively. This includes in 
particular the case of a single sample, or, more generally, of truncated sampling 
schemes if at some stage a, = b». 

THEOREM 2. Let X, , X2, «++ be a sequence of random variables such that for all 
m the joint density pi” (x1, +++ , tm) of X1, ++: , Xm satisfies (B) and (C). Then 
the power function B(0) of any generalized probability ratio test is nondecreasing. 

Proor. Let 


a Por (a1, etal” Sn) 


Zn = ° 
7 Py” (x1, ae Lm) 


Then for 6 < 6’ we have that for all k 
Po{Zm > k} S Po{Zm > k}. 
This follows from the fact that by (C), the set 
(m) 
- Pe, (a1, +++ , Lm) } 
oes eke) MR e cS Sy § 
{(e Lm) De (ai, =>» Em) 


is increasing, and that by (B) the probability of an increasing set is monotone in 
6. Since Z,, is real-valued, there exists by Theorem 1 a real-valued function f» 
such that f,.(z) 2 z for all z, and the distribution of f,.(Z) is given by 


Polfm(Zm) Su} = Po {Zn S u} for all w. 


Consider now the points (1, Z:), (2, Z2), --- and the path they describe in the 
(i, Z)-plane. With the generalized probability ratio test, observations are taken 
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as long as this path lies within a certain prescribed band, and the hypothesis is 
accepted or rejected according as the path leaves the band for the first time 
through the upper or lower boundary. Now the path @’ formed by the points 
(1, fi(Zi)), (2, fe(Z2)), --+ lies entirely above the path € formed by the points 
(i, Z;), and hence whenever € leads to rejection by leaving the band through the 
upper boundary, so does @’. But the probability of @ and @’ leading to rejection is 
exactly 8(@) and 8(6’) respectively, which completes the proof. 

It may be worth noting that use was made of condition (C) only for the pair 
of values (4% , 4;). 

Some simple applications of this theorem are to cases in which X,, Xe, - 
are independently, identically distributed random variables, with probability 
density fe(x) for which fy-(x) / fe(x) is nondecreasing in x whenever @ < @. Inall 
such cases it follows from Theorems 1 and 2 that the power function of a gen- 
eralized probability ratio test is nondecreasing. 

EXAMPLE 3.1. Let the density of the X’s be given by 


fox) = Og(z) + (1 — O)A(z), 0s 681. 


This is the situation in which the population under investigation is a mixture of 
two populations. In an experiment, for example, there may be the possibility of 
‘gross errors” in addition to normal errors. Or it may be the problem of detecting 
the frequency of mutation of some gene, the effect of which is not directly ob- 
servable. Since 


,| g(x) _ 
f(z) _° ks i] 


fo(zx) g(x) ] F 
o| a) —1i+1 


it is seen that for @ < 6’ this ratio is increasing in x provided this is the case for 
g(x) / h(x). 


EXAMPLE 3.2. Let 
(3.2) fo(x) = g(x — 86). 


Then (A) clearly holds without any restriction on the function g. On the other 
hand, (C) is exactly the condition of twice positivity of Schoenberg [11], a real- 
valued measurable function g being m times positive if, for every k(=1, --- ,m), 
Uy < Up < +++ < U,V < ve < +++ < y implies that the determinant 


det || g(u; — v,) || 2 0. 
A trivial specialization of Lemma 1 of [11] shows that a probability density g is 


twice positive if and only if (i) its domain of positivity is an interval (a, b), 
—*o Sa<b& ~, (ii) the function — log g is convex (and hence automatically 
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continuous) in the open interval (a, b), and if g is correctly defined at the end 
points, as can always be achieved.” 
As specific examples, let 


g(x) = —=e (Normal) 


g2(x) 
g3(x) i = (Exponential) 
g(x) = (Rectangular) 


gs(x) = (Logistic) 
ge(x) = (Laplace) 


gz(x) (Cauchy) 


rl+2 
In the first six of these cases —-log g is convex while in the last it is not. A general 
class of densities of form (3.2) that satisfy condition (C) is formed by the cases 
in which g is a Polya frequency function. This class was defined and investigated 
by Schoenberg (see for example [11]) who showed these functions to be totally 
positive (that is, k times positive for all k = 1, 2, ---) and hence in particular 
twice positive. 

EXAMPLE 3.3. Let 
: #2 
(33) fi) = ho (2), 
where g is an even function, and where, without loss of generality, one may re- 
strict x to be nonnegative since the absolute values | X, |, | X2|, --- form a set 
of sufficient statistics for 6. It is then seen as in the previous example, or can be 
deduced from it by transforming to Y = log X, that (C) holds if and only if 
the domain of positivity of g is an interval (a, b) and —log g(e*) is convex for 
log a < x < log b. This holds in the cases g; , gs , gs and g; of the previous ex- 
ample. Since the convexity of —log g(x) implies that of —log g(e*) but not con- 
versely, condition (C) in the case of an even function is more restrictive for a 
location parameter than for a scale parameter. 

ExampLe 3.4. A well-known example, which satisfies also the stronger con- 
ditions investigated by Rubin [10], is that of an exponential family, with 


folx) = al0)e h(x) 


5 The same condition was encountered in a slightly different context by Ruist, ‘“Compari- 
son for tests of nonparametric hypotheses,’’ Arkiv. fér Mathematik, Vol. 3 (1954), pp. 133- 
163. Logarithmically convex functions have also been considered by Artin in his ‘“‘Ein- 
fiihrung in die Theorie der Gammafunktion,’’ Hamburger Math. Einzelschriften, No. 11, 
B. G. Teubner, Leipzig, (1931). 
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where 6(@) is a strictly increasing function of @. This includes among others 
the binomial and Poisson families of distributions. It also includes the cases 
fe(y) = g(y — 6) where g is one of the densities g, , g. of Example 3.2, in the 
first case with y = x and in the second with y = e~ ~. Still further special cases 
are obtained by putting fe(y) = (1/6)g(y/@) with g one of the functions g; , gs 
or gs of Example 3.2 and y = —2’, y = —xz and y = —|z| respectively. 

Without going into details we mention as further application of Theorem 2 
some sequential tests of composite hypothesis, discussed among others by Wald 
[12], Cox [3], Johnson [4], such as the sequential ¢-tests or sequential analysis 
of variance tests. In those cases the variables X,, X., --- are dependent. 
That (C) holds follows from the fact that the noncentral t- and F- distribu- 
tions satisfy (C) (see Section 4, Examples 4.3 and 4.4), while (B) is easily checked 
in all these cases. 


4. Tests with guaranteed power. As another application consider the problem 
of testing that @ S against the alternatives 6 = 6, on the basis of X = 
(X,, --- , X,). It is desired to find that test which, subject to 


(4.1) 8(0) S for 0S 6, 
maximizes the minimum power over @ = 6,, that is, which gives the greatest 
possible guaranteed power in that range. The solution to this problem is to 


determine a least favorable pair of distributions \), \,; over the sets w = 
{0:0 S 6} and w, = {6:6 = 6,}, and to reject the hypothesis when 


[ Po(t1, *** Zn) dA, (8) 


(4.2) =k. 


| po(x1 72s Tn) ddo(8) = 


If the family of distributions is ordered, it seems reasonable to expect that the 
least favorable distributions are those assigning probability 1 to the points 
6 and 6; respectively, in which case (4.1) reduces to the probability ratio test 


(43) Poi, ++» Fn) 
Pe (Xi, pire En) 


It follows from Theorem 8.3 of [9] that (4.3) is the solution to the stated problem 
provided 8(6) < B(%) = a for 6 S 6 and 6(@) = 8(@) for 6 = 6,. But this 
is certainly the case if 6(@) is nondecreasing. A sufficient condition for this is 
that both (B) and (C) hold since then the critical region (4.3) is increasing and 
hence its probability is a nondecreasing function of @. (Actually, this is a special 
case of Theorem 2.) That (B) alone is not enough is seen, for example, in the 
Cauchy case. If X,, --- , X, are independently, identically distributed with 
density x / (1 + (x — 6)°), it is seen that the region in which 


“1+ (x; — 6)" 
I; + (x; — 4%)? 


is a bounded set in n-space. Its probability therefore tends to zero as 6 > . 


>k 
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A limiting case, as 6, — 6, of the property that the minimum power over 
6 = @, be a maximum is that of locally maximum power. Here one seeks the 
test which maximizes the derivative 8’(@) of the power function at @ = @&. 
If for any critical region w, the integral 


8 = [ pale) du(a) 


can be differentiated under the integral sign with respect to 6, the problem be- 
comes that of maximizing 


(0) = | Slog pale)|  po(2) du(a) 


subject to (4.1). If we again tentatively replace (4.1) by the side condition 
8(%) = a, the best critical region by the Neyman-Pearson fundamental lemma 
is given by 


(44) Slog m(z)| =k. 

660 
If (C) holds, it was seen earlier that the left-hand side of (4.4) is a nondecreasing 
function of the z’s. Hence it follows from (B) that 6(@) S B(%) = afor @ S % 
and therefore that (4.4) is the desired result. 

Let X,, --- , X, be independently and identically distributed with density 
fe(x), which is either a mixture of two densities in proportion 6:1 — @, or where 
6 is a location or scale parameter, and suppose that the conditions of Examples 
3.1-3.3 respectively are satisfied. Then the test maximizing the minimum power 
over 6 = 6 is given by the rejection region 


fo, (x1) =" fo, (an) : 
- fudles) ~~ Jolt) = 


and the test maximizing the power locally by 


(46) DY Slog fol) | =k’. 

i=l 6=6 o 
A uniformly most powerful one-sided test does of course usually not exist. 
A notable exception is the well-known case of the exponential family of Ex- 
ample 3.4. 

As an illustration consider the case that fs(z) = g(a — 6) where g is one of 
the densities g; (¢ = 1, --- , 7) of Example 3.2, and that 4 = 0. Fori = 1,2 
these are exponential families, and the test given by (4.6) is uniformly most 
powerful against the alternatives @ = 0. The same conclusion holds also for 
7 = 3 since in that case Y = min (X,, --- , X,) is a sufficient statistic with 
density n exp [—n(y — 6)] for y 2 6. The case i = 4 is interesting in that again 
a uniformly most powerful one-sided test exists, although the minimal sufficient 
statistic is (Y, Z), with Y = min; X;, Z = max, X,; , and hence two-dimensional. 
The explanation is that the statistic Y by itself is sufficient for @ 2 0 when 
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attention is restricted to the part of the sample space that is possible when 
6 = 0. In the case of the logistic distribution the locally most powerful test 
can be written down by substituting in (4.6). It is not uniformly most powerful, 
but is unbiased since (C) holds. For i = 6, when the sample is drawn from a 
Laplace distribution with unknown location parameter 6, the power function 
of a test may not be differentiable. However, it turns out that a locally most 
powerful test, in the natural sense of the term, still exists and, perhaps somewhat 
surprisingly, is given by the sign test, as will be shown in the appendix. Finally 
in the case i = 7, that of a Cauchy distribution, (C) does not hold, and the 
locally most powerful test does not seem to have a simple structure even when 
n=1. 

We now turn to some applications in which the variables X,, --- , X, are 
not independent. Dependence may for example be introduced through the 
elimination of nuisance parameters by the principle of invariance or because the 
observable variables involve some common unobservable components, and the 
joint density of the z’s will be a mixture of densities of independent variables. 
We first give a sufficient condition for (C) to hold in that case. 

THEOREM 3. Let x = (%1, «++ , Zn) and let ge(x, £) be a family of densities de- 
pending on two real parameters @ and & and jointly measurable in x and &. For 
each 6, let Xe be a measure for — such that for all x, the integral 


pale) = | gol, 8) dd) 


exists. Then a sufficient condition for the family of densities p(x) to satisfy (C) 
is that for 6 < 6’ condition (C) holds (i) for ge(x, &) when & is fixed and @ is taken 
as the parameter, (ii) for gs(x, &) when 0 is fixed and ¢ is taken as the parameter, 
(iii) for ddo(€). 

Here in assumption (iii) the densities ddg() and ddg-(€) may be computed 
with respect to any o-finite measure v that dominates both of the given measures, 
since only the ratio of the densities matters. In the proof that follows and later 
in the paper we shall therefore denote this ratio by ddg-(E) / dde(~). This should 
not be taken to imply that A, is absolutely continuous with respect to d» , but 
should be interpreted as a shorthand notation for (dd, / dv): (dd¢ / dv). 

Proor. We must show that x S 2’ implies 


[ oe’, ane) — f goa’, ) dv 
EN See OE ghee Meme Serer Sete 


(4.7) < é 
[aed a — f gee, ® arv(e) 


Let A and A’ be the probability distributions given by 
dA(é) om go(x, £) ddo(E) dA’ (é) — go (x, FF) ddo: (€) ‘ 
| aCe.) ave) | ge (x, 8) dre) 
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These are the a posteriori distributions of — given z, corresponding to @ and 
@’ respectively. Then (4.7) may be rewritten as 


go(x’, &) §) ge (x’, &) 
48) poz, t) A® 8 J OG. 6) 


By assumption (i) it is enough to prove that 


dA’ (é). 


(x’, &) 
(4.9) {2 »S/ [da’(t) — da(é)| = 0 
ep lda'(e) — dato) 
By assumption (iii) the £-axis can be divided into two mutually exclusive and 
exhaustive intervals S_ and S, such that S_ lies to the left of S, and 


dA’(~) /dA(é) is S1 in S_ and 21 in S,. We then have that the left-hand 
side of (4.9) equals 


(4.10) a I [aa’(@) — da@] +b I [aa’() — aa) 


where a and b are mean values of ge(x’, =) / ge(x, —) in S_ and S, respectively 
so that by assumption (ii), a < b. Since A and A’ are probability measures, 
(4.10) becomes 


o- 0) [ wv@-ae@= 6-0 [ [22 -i]ae zo, 


dA (é) 


and was to be proved. 
Coro.uary. Let & be vector-valued, § = (& , +++, &) say, and let 


po(z) = | gol, 8) ded. 


Suppose that the measure Ag is the product of s linear measures Ag = dj” x 

rf? x --+ % AS”? each of which satisfies condition (iii) of Theorem 3, and that 

ge(x, £) satisfies condition (i) of this theorem. Suppose that condition (ii) is replaced by 
(ii’) for each j = 1, --- , 8 — 1, the ratio 


golxr, +++, Lm, br, °° Sy Sint, °° * » bd) 
ge(X, °** 5 Tn, bi, ooo, &5, Siar, -2*, &) 


is nondecreasing in £j41, -+* , & provided x; = a, (i = 1,--+,n) andé; <= 
(¢ = 1,---, J). 
Proor. It is seen from Theorem 3 by induction over j that 


(9) 
ge” (1, °** Un, Ean, °° y Se) 


. | oles, +++, 20 By o>, &) OE) io 


satisfies conditions (i) and (ii’), and this yields the desired result. 
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As an application we consider: 
EXamPLeE 4.1. Let U,, --- , U, be a sample from an unobservable random 
variable with density fe(u). What we observe are 


Xi, = Us + Vi; 


where the V’s are independently normally distributed with mean zero. For 
the moment we shall assume the variance of the V’s to be known, and hence 
without loss of generality to be equal to 1. A typical example is the usual simplest 
model II problem in which @ is a scale parameter. We shall assume that f,(u) 
is an even function of u and that for @ < 6 the ratio fe-(u) / fe(u) is an increasing 
function of |u|, and consider the problem of testing 6 < 6 against @ = 6. The 
joint density of the X’s is given by 


p(x) = Il [ a exp | -3 x (xi; — w)*| fo(u:) du;. 


t=1 


Therefore the absolute values of the means 7, --- , Z, constitute a set of 
sufficient statistics for 6, and we may restrict attention to them. Putting 
ys = Vn and & = +~/nu, we obtain the joint density of the Y’s as 


potuis <0) = CTT [exp (-8os — 8) Uel& / Vn) ae. 


We shall now prove condition (C) for the density of the | Y |’s. Since we are 
dealing with a sample it is enough to check this for the case s = 1. We have 


(4.11) ply) = Ce I em! + eMeFH(E/ Vn) dé. 


Condition (iii) of Theorem 3 is satisfied by assumption, and we need only check 
(i) and (ii) with 

gly, t) =  (e™ + eet” for &> 0. 
Since this is independent of @, assumption (i) clearly holds. Examining (ii) we 
have 

gely’, &) _ eta’) e+ 

goly, £) ety + ety * 
Now if |y| S |y’|, it is easily checked that (e* + e*”) / (e&~™ + e”) is an in- 


creasing function of |£|, and this completes the proof of (C). It follows that the 
test which rejects when 


Po(yr) +++ Posy) 
Poo(ys) +-* Peo(Ys) 


where pe(y) is given by (4.11) maximizes the minimum power for testing @ < % 
against 6 => 6,. 
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We next consider the following somewhat more realistic case. 

EXAMPLE 4.2. Suppose that the assumptions of Example 4.1 hold but that 
the variance o° of the X’s is unknown. We assume further that the unknown 
parameter in the distribution of the U’s is a scale parameter, say r. The problem 
is to test r/o S 6 against r/o = 6,. Putting 6 = +/o the joint probability 
density of the X;; is 


II / es exp | -3p > (ey - u)*| ~j ) du;. 


Here the statistics V = >> (X;; — X)*, X1, --- , X, are jointly sufficient. 
Putting Y; = VW/nX;, &: = ~W/nu;z, the joint density of V and the Y’s is given by 
ez 


— ,(N—#)/2—1 aE p al is . a a: Ue 
oft exp l | I] / exp l 50% (yi — &) ] act (7 53) dé;. 


Now the problem of testing 6 S 4 against @ = 6; remains invariant under 
multiplication of the Y’s by a common positive constant a and of V by a’, 
and there exists a solution to the given problem which is invariant under these 
transformations. We may therefore restrict attention to the maximal invariant 
(21, «++ , 2) where z; = y; / ~/v. The joint density of the Z’s is given by 


e 


of oe [TT J exp (teva — e155 (gre) ae | 


= [ se Ca) [cow (—4 > &) [ ve" 


I] (efi Ve 4 ei tive) av] dé ee dé, . 

Denoting the expression in brackets by g(z, ) we shall now show that 
g(z’, =) / g(z, &) is increasing in ¢ for z S 2’, the other two conditions of Theorem 
3 being satisfied as before. To prove that g(z, £) has the desired property we 
apply once more Theorem 3 with z, v, ¢ playing the role of 6, &, z in this order. 
The weight function for u being d\(v) = Cv’e~*” independent of z, condition 
(iii) is satisfied. Putting 


h.(v, §) = C exp [—3). &] II (eta 4 gt tt Ve) 


it is enough to show that h,-(v, &)/h,(v, —) is increasing in v and ¢~ and 
h.(v, =’) / h.(v, &) in —& where the é’s are assumed to be nonnegative and where 
\z,| < |z,| fori = 1, ---,s. Now 


bi) uo Te ea 
h.(v, &) i ec ttive 4 bitives 


and each factor is increasing in v since |éz;| S |£{z;|. Similarly h,-(v, £) / h,(v, £) 
is increasing in v and |é;|. Finally, condition (ii’) of the Corollary to Theorem 3 
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is checked in the same manner, and it therefore follows from this corollary 
that (C) holds for the density of the Z’s. 

From this and the fact that the density of the Z’s is even in each of the variables 
it is seen that the most powerful invariant test for testing 6 against 6, has a 
rejection region which is increasing in |z|, ---, |z,|. That the probability of 
such a region is increasing in @ is a consequence of the fact that condition (A) 
holds, @ being a scale parameter for the Z’s. 

As two further illustrations we prove that the noncentral ¢ and F densities 
have monotone likelihood ratios, that is, satisfy (C), so that the associated 
tests have the minimax property discussed at the beginning of this section. 
The first of these results was earlier given by Kruskal [6]; the second was ob- 
tained by Rushton (personal communication) and by Meyer (in “An applica- 
tion of the invariance principle to the Student hypothesis,’ Technical Report 
No. 24, Department of Statistics, Stanford University, unpublished). A result 
containing these two as special cases was obtained about simultaneously with 
the present paper by Karlin (“On distributions (p(z|w) for which p(z|w:) - p(z|we) 
is monotone,” Technical Report No. 26, Department of Statistics, Stanford 
University, unpublished), who considered densities of the form 


gu(z) = (0px) | ee a(t) 


EXAMPLE 4.3. Let pe(t) denote the noncentral ¢ density with noncentrality 
parameter 6, (including as a particular case the central density for @ = 0), 
that is, the density of Student’s ¢ statistic when the sample on which it is based 
is drawn from a normal distribution N(n, o’). Then 


po(t) - ont exp |-3 (w sas 0 | wre ea in 
0 


where n is the sample size and @ = n/c. That pe(t) / pe(t) is an increasing func- 
tion of ¢ for ¢ S 0 follows directly from Theorem 3. For ¢ 2 0 it can be seen 
by noting that Theorem 3 remains valid if the ratios considered in (ii) and (iii) 
are nonincreasing instead of nondecreasing, with the ratio considered in (i) re- 
maining nondecreasing in zx. 

EXampPteE 4.4. The noncentral F-density with r and s degrees of freedom and 
noncentrality parameter 6 is given by 


plu) = > Polk) hese ork(U), u 0 


where h,+;,.4% is the central F-density with r + k and s + k degrees of freedom, 
and where 
Pik) = &e* /k! 


is the Poisson probability with parameter 6. It again follows immediately from 
Theorem 3 that for @ < @ the ratio pe-(u) / pe(u) is increasing in wu. 
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We shall now mention some problems in which the conditions of Theorem 3 
do not appear to be satisfied. In these situations it would be of interest to obtain 
basic densities f under which the probability ratio test for testing % against 
6, maximizes the minimum power against @ = 6, . In all of these problems this is 
easily shown to be the case when f is the normal density. 

Prosiem 1. Let Xi, --- , X, be a sample from (1/6)f((z — &) / 6). Then 
the distribution of the differences X; — X,; depends only on 6. It follows from 
the Hunt-Stein theorem that for testing @ < 4 there exists a test depending 
only on these differences and which maximizes the minimum power over @ = 6. 
The problem mentioned then arises for the joint density of these differences, 
which is easily written down and which is of course independent of £. An elabora- 
tion of this problem is the case of two samples from densities (1/c)f((x — £) / «) 
and (1/r)f((y — r) / +) respectively, where @ = a/r. 

ProsLeM 2. Let X,, --- , X, be a sample from (1/c)f(x/o — 6). Here it is 
the ratios that play the role of the differences in Problem 1. In the two-sample 
version of this problem the samples came from (1/c)f((x — §&)/o) and 
(1/o)f((y — n) / a), and @ = (n — &)/c. 

Pros.eM 3. Let X,, --- , X, be a sample from f(z — 6), where f is even and 
consider the problem of testing |@| S 4 against |6| = 6, . Here one would expect 
the test that maximizes the minimum power to be given by the rejection region 


f(a — 6) +: f(xn — 6) + f(x + 0) +: f(an + 6) >C 
f(a: — 8) +++ f(tn — 0) + f(a + 9) +++ flan + %) ~ 


This will be the case provided the probability of this region is an increasing 
function of |6|. The problem is to find conditions on f which would insure this. 


5. Comparability of experiments. When a family of distributions {Pe} is 
ordered, it seems reasonable to expect 

(D). The pair of distributions (6, , 6:) is more® informative than the pair 
(@ , 6:) in the sense of Blackwell [1] provided 6 < % < 6: < 6. 

Let $2 and ¢, be the most powerful level « tests for testing 4 against 6, and 
% against 6; respectively. Then Blackwell showed in [2] that (6, 9;) is more 
informative than (0, 6:) if and only if B.(0:) < 82(6;) for all a, where 6, and 
8., denote the power functions of ¢. and ¢. . 

A somewhat stronger property than (D) which one might also expect to hold 
in an ordered family is: 

(E). Let 0 < 6, and let A», A: be any distributions over the sets 6 S % and 
6 = 6, respectively. Then the pair of distributions (fpe(x) ddo(), Spe(x) dd1(8)) 
is more informative than the pair (pp,(x), po,(x)). 

Clearly (E) is actually stronger than (D). As a trivial example, let X be 
normally distributed with unit variance and mean £ and let 6) = 6) correspond 
tof = 0, tog = —1and 6, tog = +1. Then (6, 6) and (6), 6;) are equally 


* Throughout we shall understand with Blackwell ‘‘more informative’’ in the weak sense 
of ‘fat least as informative.”’ 





416 E. L. LEHMANN 
informative and both strictly more informative than (4, $6, + 46;). It would 
be interesting to know whether more natural examples of this phenomenon 
exist such as, for example, a family of densities g(x — @) which satisfies (D) 
for all 6 S< 0 < 6, < 6 but for which (E) does not hold. 

A condition equivalent to (E) is: 

(E’). For every pair 6) < 6; and every a the power function 8(@) of the proba- 
bility ratio test for testing 4 against 6, satisfies 


B(8) S B(4%) for 6 < %, 
B(6) = B(A) for 02 6. 


To see this, note that (E) states that at every level a, the pair of a priori dis- 
tributions assigning probability 1 to % and 6, respectively is least favorable 
for testing @ S 4 against 6 = @,. It follows from Theorem 3.10 of [13] that 
(E) implies (E’). The converse is also a special case of a well-known simple 
decision-theoretic result, or alternatively can be seen from the proof of Theorem 
4. Since (E’) is a consequence of (B) + (C), so is (E). On the other hand, the 
following example shows that (B) is not enough to insure even (D). 

ExampLe 5.1. Let X be uniformly distributed over the union of the two 
intervals (@ — ?,0— 4), (@+ 4, 6 + #). Then (B) holds since @ is a loca- 
tion parameter. On the other hand, the pair of distributions (@ = 0, @ = 4) 
is clearly strictly more informative than the pair (@ = 0, 6 = 1). 

We shall finally show that (B) + (C) permit an even stronger conclusion 
than (E). 

Tueorem 4. Let pe(x) be a family of probability densities satisfying (B) and (C). 
Let (Xo , Ax) and (Ao , A) be two pairs of probability distributions for the parameter 6 
such that the three ratios do /ddo, dd. /ddo, dd; / dd. are all nondecreasing. 
Then the experiment 


(5.1) 


(f vata) ansto, f paz) arse) 


is more informative than the experiment 


( / po(x) dro(), / po(x) axl). 


It is convenient to prove first the following lemma. 

Lemma 3. Let x = (a1, +++ , Zn), and let po(x) be a family of densities satisfying 
conditions (B) and (C). Let i, X’ be two probability measures for 6 such that 
dy'(6) / dd\(@) is nondecreasing in 6. Then 


— f pole) avo 


is nondecreasing in x, 


( 
| pulz) ano) 
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(ii) af d(x) is nondecreasing in x, 
(5.1) | B0@ a@ s | Eo(x) vO. 
Proor. (i) follows from Theorem 3, since pe(x’) / pe(x) is nondecreasing in z, 


and dX‘(@) / d\(@) is nondecreasing in @. To see (ii), let y(@) = E,¢(X). Then by 
(B’), ¥(@) is nondecreasing and it is easily seen that 


[ volar @ — an) z 0. 
Proor or Turorem 4. Let ¢, and ¢. be the most powerful level @ tests for 


testing fpe(x) ddo(@) against fpe(x) dd(@), and {pe(x) ddo(@) against {pe(x) dd3(8) 
respectively. Let 


(a) = | Evde(z) a), (a) = | Erga(a) aril) 


denote the power of these two tests for their respective alternatives. Then the 
desired result follows if for all a we have B(a) S §’(a). It is seen from part (i) 
of Lemma 3 that the rejection functions ¢, and ¢, are nondecreasing. Therefore, 
by part (ii) of the lemma 


| Fegala) aso) s | Bedalz) d(0) = a 
so that ¢ is a level a test also for the hypothesis {p9(x) ddo(6), and 
[ Besalz) ai@) < 8"C@) 


since 6’(a) is the power of the most powerful level a test. Also, by part (ii) of 
the lemma 


sla) = | Belz) aul) s [ Bro(2) ane), 


and the result follows. 


In conclusion I should like to thank a referee of this paper for many very 
helpful suggestions. 


6. Appendix. A property of the sign test. It was recently shown by Hoeffding 
and Rosenblatt (‘“The efficiency of tests,’”’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 52-63) that the sign test is asymptotically most efficient for detecting 
a small shift in the distribution with density 4e~'*!. We shall show below that 
the sign test is in fact locally most powerful for testing H:@ = 0 against the 
alternatives 6 > 0 when 


(6.1) pel%1,°** 52a) = gs ooieet 
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for any fixed sample size n. In this we shall restrict ourselves to levels of sig- 
nificance a at which the sign test can be carried out without randomization, 
that is, to one of the levels 


(6.2) am = (TY /2, m=0,1,---,n-—1. 
k=0 


Since the power function 8(@) of a test of this hypothesis may not be differentiable, 
we shall state the optimum property of the sign test more precisely as follows. 

Let 8*(@) be the power function of the sign test at one of the levels a, and 
let 8,(@) be the power function of any other test @ of H at the same level. Then 
there exists A such that 


(6.3) 84(0) < B*(8) for 0<@0<A. 


To prove this, let us denote by R, (k = 0, --- , n) the subset of the sample 
space in which k is of the X’s are positive and n — k are negative. The proof 
follows easily from the following lemma. 

Lemna 4. Let 0 S k <1 S nand let S, , S; be subsets of R, and R, respectively 
for which 


(6.4) Po(Sk) = Po(S:). 
Then there exists A,.; such that 
(6.5) P,(S;) < P,(S8)) for o<8< Aki ° 


Proor. We note that 
6 


e if «<0 
—|z—-0| 
é 27-6 


e7 lz 


e f# @0< 2 


6 —6 


and that e° < e””* < ¢& if0 <x < #@. Let S:¢ denote the subset of S, for 
which the / positive z’s are all >@. Then 

P(Si) = ce" Po(Sie) + &™Po(Si — Sis) 

PS) < e*~-" Po( Si). 


Putting »(@) = Po(S: — Si») and denoting the common value of (6.4) by y, 
we therefore have 


Po(S:) a Po(S:) > ety = n(@)] + e’n(6) fen gore. 
This will be positive provided 
a at em > n(0)[e~”” if rm, 


Up to terms of order 6, the left- and right-hand sides are respectively 2y(1 — k)@ 
and »(@)(2k — n)@. Since n(@) +0 as 6 — 0, it follows that the desired inequality 
holds when @ is sufficiently small. 
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The result expressed by (6.3) is now an obvious consequence when the alter- 
native test ¢ is nonrandomized. For consider any rejection region that does not 
consist of the upper tail of a sign test. Then it can be converted into a sign 
test of the same size by a finite number of steps, each of which consists in re- 
placing an S; by an S; with k < 1 which satisfies (6.4). 

Only minor modifications of the argument are required in case the alternative 
test @ is randomized. In particular, in the lemma, the sets S, and S, are re- 
placed by critical functions ¢; and ¢; over R, and R; respectively, such that 


Eogi(X1, +++, Xn) = Eodi(X1, +++ , Xn), 
the conclusion being that 


Eg oi(X1, +++, Xn) < Eo bi(X1, --+ , Xn) for 0< 6 < Aya. 


It is interesting to note that the sign test, being similar for testing H:@ = 0 
when the density (6.1) involves an unknown scale parameter, is also locally 
most powerful for that problem. 


It should be mentioned finally that the above proof may be modified to show 
that the two-sided sign test maximizes $[6(@) + 8(—6)] for sufficiently small @. 
This test is therefore locally most powerful among all tests that are symmetric 
with respect to the origin. 
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ON AN APPLICATION OF KRONECKER PRODUCT OF 
MATRICES TO STATISTICAL DESIGNS! 


MANOHAR NARHAR VARTAK 
University of Bombay 

1. Summary. By a statistical design (or simply, a design) we mean an arrange- 
ment of a certain number of “treatments” in a certain number of “blocks” in 
such a way that some prescribed combinatorial conditions are fulfilled. With 
every design is associated a unique matrix called the incidence matrix of the de- 
sign (definitions, etc., in subsequent sections). In many instances, e.g., [7], [8], 
[10], [12], [16], information regarding certain kinds of designs such as BIB, 
PBIB designs is obtained from properties of the matrix NN’ or of its determi- 
nant |VN’| where N is the incidence matrix of the design under consideration. 
On the other hand in a few cases, such as [4], [5], [11], [14], [15], the incidence 
matrix N itself has been used to investigate properties of designs. This paper 
gives a method of using incidence matrices of known designs to obtain new 
designs. 

In Section 2 we have defined the Kronecker product of matrices. This defini- 
tion and some properties of the Kronecker product of matrices are given in [1]. 
Section 3 is devoted to a general discussion of an application of the concept of 
the Kronecker product of matrices to define the Kronecker product of designs. 
This section also contains two theorems which illustrate the use of the method 
of obtaining Kronecker products of designs. Definitions of some well-known de- 
signs are given in Section 4, which also contains a number of results giving ex- 
plicit forms of certain Kronecker products. Finally some illustrations of a few 
results of Section 4 are given in Section 5. 


2. The Kronecker product of matrices. Let 
(2.1) A= (a;3), B= (bx), I. > Omxn 


be respectively an m x n matrix, a p x g matrix, the identity matrix of order u, 
the null or zero matrix of order m x n, all defined over the set of non-negative 
integers. The Kronecker product of matrices A and B is defined as follows. 
DEFINITION 2.1. The Kronecker product A X B of matrices A and B of 
(2.1) is defined by 
QB an B +++ Gin B 


’ 


(2.2) AXB=|%™B anB --+ am B 


Omi Bo am2Bi +++ Onn B 
where a;;B (i = 1, 2, --- , m;j = 1, 2, --- , n) is itself a p x q matrix. 
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We shall always use an “X”’ in the product of matrices to denote the Kronecker 
product. The ordinary product of matrices A and B (when it exists) will be de- 
noted by A-B or AB. 

It is clear from Definition 2.1 that the Kronecker product always exists and 
that A X B is an mp x ng matrix defined over the set of non-negative integers. 
Also it is obvious that the Kronecker product of two matrices reduces to the 
ordinary product if and only if one of the matrices is a scalar. 

The result contained in the following theorem will be used later in Section 3. 

THEOREM 2.1. For any matrices A and B as in (2.1) we must have 


(2.3) A X B = P-(B X A)-Q 


where the matrices P and Q are obtained from the identity matrices Im, and Ing 
respectively by permuting rows and columns. 

It should be noted that P and Q are nonsingular matrices whose elements con- 
sist only of 0’s and 1’s, and that the matrices P and Q are the same for any A 
and B as defined in (2.1). 

A proof of Theorem 2.1 can be constructed from that of a similar result proved 
by Murnaghan [1], who gives various other properties of the Kronecker product. 


3. The Kronecker product of designs. Let D,, p = 1, 2, bea design in which 
v, treatments are arranged in b, blocks. Let N,, the incidence matrix of the 
design D, , be defined by 


(3.1) N, ™ (ni?)), tp - 1, 2, *e* 5 Up, Jp — 1, 2, > ,»6,, 


where n{??, is the number of times the i,th treatment of D, occurs in the j,th block 
of D, . Clearly nr? is a non-negative integer so that N, is defined over the set of 
non-negative integers. Since a design uniquely determines its incidence matrix 
and vice versa, we may denote both a design and its incidence matrix by the 
same symbol. Also the treatments and blocks of a design correspond respectively 
to the rows and columns of the incidence matrix of the design. 

Let N; and N, be the designs defined by (3.1). Then 


(3.2) Nw = Ni X Nz 
uniquely determines a design and so does 
(3.3) Nan = N2 XM. 


Theorem 2.1 at once leads to the following theorem. 

THEOREM 3.1. If N; and Nz are designs defined by (3.1), then the designs Ni 
and Nx defined respectively by (3.2) and (3.3) are structurally the same, i.¢., one 
of them can be obtained from the other by simply renaming the treatments and re- 
numbering the blocks. 

This theorem enables us to designate the designs Ni. and Nz, by a common 
symbol N, the incidence matrix of N being taken to be N, K Nz or N2 X Ni, 
whichever is convenient. 

Since the incidence matrix of the design N obtained above is the Kronecker 
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product of the incidence matrices of the designs N; and Nz , we may say that the 
design N is the Kronecker product of the designs N; and N2. 

We now examine a few matrices and the corresponding designs. 

3(a). Let N; be a row n-vector 


(3.4) N, = (11--- 1) 


there being n 1’s on the right-hand side. The design N, clearly consists of n 
blocks each of size one, each block being treated by the same single treatment. 
3(b). Let Nz be a column m-vector 


(3.5) 


there being m 1’s on the right-hand side. The design N-» is a single replication of 
m treatments in one block of size m. 
If No be any design, then with N; as in (3.4), we have 


NY = N, ~*~ No = (No No ‘> No), 


there being n N,’s on the right-hand side. This means that the design N“” is 
nothing but n replications of the design No as a whole. Again if No be any de- 
sign, then with N» as in (3.5), we have 


r(2) 
N™ =NoXN2, 


where clearly N® defines a design which is obtained from No by replacing each 
treatment of No by a group of m treatments. Also the rows of N™ consist only 
of m repetitions of each row of No. 

These two results can be combined into the following theorem. 

TuHeoreM 3.2. If No be any design and if N,; and Nz be as defined in (3.4) and 
(3.5) respectively, then the designs N“ = N, X No and N® = No X Nz are 
respectively 

(i) n replications of the design No as a whole, and 

(ii) the design obtained from No by replacing each of its treatments by a group of m 
treatments so that the rows of N® consist only of m repetitions of each row of No. 

The following corollaries to Theorem 3.2 are trivial. 

CoroLuary 3.2.1. A randomized block design, N;, with m treatments and n 
blocks, each block being a complete replication, is the Kronecker product 


(3.6) Nz; = Ni X No 


of the designs N; and N, defined in (3.4) and (3.5) respectively. 

Coro.uary 3.2.2. If No be any design and N; be the randomized block design 
defined in (3.6), then Nz XK No defines a design which contains n replications of the 
design derived from No by replacing each of its treatments by a group of m treatments. 
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3(c). The design corresponding to J, , the identity matrix of order u, contains 
u treatments and u blocks each of size one, and the ith block contains a single 
plot to which the ith treatment is applied, i = 1, 2, --- , w. 

The following corollary to Theorem 3.2 is also trivial. 

Coro.uaryY 3.2.3. With Nz as defined in (3.5), we have 


N=I1,XN:2, 


which defines a design N useful for confounding with blocks the effects of certain 
treatment combinations of a factorial design when u and m have suitable values. 

It may be noted that if No be any design, then the Kronecker product J, K No 
is always a disconnected design, and therefore no further illustrations involving 
I,, will be given. 


4. Special cases of Kronecker products of designs. We first define a few de- 
signs. 

- 4(a). Design N, is already defined in (3.5). 

4(b). A balanced incomplete block (BIB) design Nx:, with parameters v*, 
b*, r*, k*, \* is defined to be the one in which the v* treatments are arranged in 
the b* blocks of size k* each, such that 

(i) the treatments in any block are all distinct, 
(ii) each treatment is replicated r* times, and 

(iii) every pair of treatments occurs together in \* blocks (cf. [12]). 

4(c). A partiaily balanced incomplete block (PBIB) design N$2:_ with s 
associate classes and with parameters 


le " i —_ 7s 
v, b, r, k, Ni, Pik» i,j,k = ey 8 


is defined as follows. 

(i) There are v treatments arranged in b blocks each of size k such that each 
treatment is replicated r times and the treatments in any block are all distinct. 

(ii) There can be established a relation of association between any two treat- 
ments, satisfying the following conditions. 

(a) Two treatments are either Ist, 2nd, --- , or sth associates. 

(8) Each treatment has n; ith associates, i = 1, 2,---, s. 

(y) Given any two treatments which are ith associates, the number of treat- 
ments which are common to the jth associates of the first and the kth associates 
of the second is p}, and is independent of the pair of treatments with which we 
start. 

(iii) Two treatments which are ith associates occur together in exactly ),; 
blocks, i = 1, 2, --- , s (ef. [8}). 

PBIB designs with two associate classes have been extensively investigated 
by Bose and Shimamoto [9]. 

When the parameters A; , \2, --- , A, of a PBIB design are not all different, 
the s associate classes of the PBIB design may not be all distinct. The following 
lemma, which is a modification of a remark by Rao [3], gives a criterion to de- 
termine whether the PBIB design has s or fewer distinct associate classes when 
its parameters \; , Ax, --* , A, are not all different. 
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Lemma 4.1. Let aPBIB design N$31» with s associate classes and with parame- 
ters 


4 é pre es . 
Vv, b, r, k, ni, AG, Pik » i,j,k = 1,2,---,8, 


be such that \1, \2, +++, A, are not all different so that at least two of them are 
equal. Without loss of generality we can assume that \; = dz. In this case the num- 
ber of associate classes of the design N'puin can be reduced from s to s — 1 by com- 
bining its first two associate classes if and only if 


2 2 2 2 2 2 
Spt Bote Sot] [Bot ote Sot 
Puw — = Pus Puw — Pus 
U,w=eel u=l ual u,w=l ual u=l 


2 


(41) | 24, Pm Pi cs Pw | 2» Paw Pas *** Die 


» Dew Des ae Des a Daw Dis eee Des 
Further if (4.1) holds, then the parameters of the reduced PBIB design with s — 1 
associate classes are 
v’ v, b’ 
m= m+ Ne, ny = 


A = =A”e, As = As, °° 


2 2 2 
2d, Pw 2 Pus tee do Pus 


uum 


= t t t 
“ P3w Pss °°* Ps 


7. a we. ee 


wal 
2 2 2 
DL pew Lo pus -** Dy pes 
U,wel u=1 u=1 
2 
D pitt pitt +. pet 
w= 


(p,2) = 


2 
z+1 zt+1 z+1 
a Pew Des Pes 
where t = lor2;z = 2,3,---,s—1l;y,z2 =1,2,---,s-— 1. 
It follows that repeated applications of Lemma 4.1 to any PBIB design will 
ultimately give a PBIB design whose associated classes are all distinct. 
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The following results give the Kronecker products of various pairs of designs 
chosen from 4(a), 4(b), and 4(c). 

TuEoreM 4.1. (a) The Kronecker product N = Nz, X N$21s of the design Nz 
of (3.5) and a PBIB design N¥arm with s associate classes and with parameters 

v, b, r, k, nN, Ai, Dik i,j,k = 1,2,-- 

is a PBIB design with at most s + 1 associate classes. 

(b) The design N defined above has s + 1 distinct associate classes if the design 
N¥2re has s distinct associate classes and \; < r for alli = 1, 2,---,8 

(c) In any case the parameters of the design N can be expressed in terms of those 
of the designs Nz and N¥ 1p by the equations: 


= mv, b’ = b, 


, 
ni = MN,;, Nei = m— 1, 


» 8, 


mph) | 
| 
| 


(p,2) [moi 
| om — D0) — 1)(6;) 


0 


m i... 
(p,:*") 


Orxs | m= 


en en -,8)y¥,2 = 1,2,--+,8 +1; 603 = Of a $ B and i. = 

1 if a = B for all a, 8 = 1,2,--- 

(d) Lemma 4.1 can be applied to the cases in which the conditions in (b) above 
are not fulfilled. 

The essence of Theorem 4.1 appears in a paper by Zelen [17]. 

Corouuary 4.1.1. The Kronecker product N = Nz X Nourp of the design Nz 
of (3.5) and a BIB design Nyuip with parameters v*, b*, r*, k*, \* is a singular GD 
(group divisible) design with parameters : 

vy = mr*, b’ = >*, , k’ = mk*, 
(4.4) 


m’ = y*, n’ =m, 1=r*, Ay = A*. 

It should be noted that singular GD designs can be obtained only by using 
Corollary 4.1.1 as was shown by Bose and Connor [8]. 

THROREM 4.2. (a) The Kronecker product N = N$ai3 X Nie of two PBIB 


designs N¥n1n and N}2rp with s and t associate classes respectively and with respec- 
tive sets of parameters 


(4.5) v1, br, 11, ki, Mi,» Mri,» Pidyes § tt yjry ka - 1, 2,°°° , 8 
(4.6) v2, be, T2, kee, Maig » Avin » Dodaks ; ta, jo, ke a 1, 2, ha 
is a PBIB design with at most t + s + ts associate classes. 
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(b) The design N defined above has t + s + ts distinct associate classes if 


(i) the s associate classes of N¢2ie and the t associate classes of N¥2ipe are 
all distinct, 


(ii) Any < Tl, Ais < Te, and 

(iii) TiA2i2 aa Tdi, for all i; = :. 2, oR 8s and te = 7 2, ooo 8 

(c) In any case the parameters of the design N can be expressed in terms of those 
of N$2rp and N$3rp by the equations: 








Yo =n, W=beb, "=n, ik’ = kk, 
Nis - Neaig ’ Ni +i, oo Ni ’ Hiae+ter = Nei * Ni; ’ 
Xs = T1-dai, , Nees, = TeAia, , Ne tpetee = Aoig Ari, » 
| (p33 4k») Orxe Orger 
aE I cuueseiiionaunnies 
( Dy:*) _ On One | (5:52) x (715,55 ,%,) 
Osx Ae agile oo reg are x (115,8j,8;) | (33.42) X (m1j,5i.6) 
| ene pe Ors (2; 5) ks) Xx (6;,,) 
(4.7) (py*") = On (pi}.e) Onrcos 
————_—— ceed 
(n2;,5 i aks) x (8;,:,) Ostxe (25,5; 2k) x (pide) 
{ 
| : 
Ore (5;.i2) x (6:,5,) (p23 2k.) x (6;,5,) 
thistige : ‘1 : 
Pyz ) ~~ (5:45) x (6;,4,) | Ox | Ce) x (pib.e) 





(p23 ok.) x (8;,:,) (5;.i.) x (pi} x) (p23 4x2) x (pi},e,) 
where %; , ji, ki = 1, 2, +++ , 83 te, je, ke = 1, 2,- 
z2=1,2,---,t+8s+ fs; 


bas = 0 if a ¥ B and bas = life = 8 for all a, B = 1, 2, aS 5 

(d) Lemma 4.1 can be applied to the cases in which the conditions in (b) above 
are not fulfilled. 

Proor. Let us consider the Kronecker product of the design Nf2rp and N} gis 
in the form N = N$21n X N$2im. Since N$R1n and N$2is are PBIB designs 
with parameters (4.5) and (4.6) respectively, it follows that their incidence matrices 
N$ rn and N$2rz are of orders », x b; and v, x b. , respectively. Also the elements 
of these matrices consist only of 0’s and 1’s, there being 7; 1’s in every row and 
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k, 1’s in every column of N¥are and r, 1’s in every row and k, 1’s inevery column 
of N$ 21s. In obtaining N from N$2r1_ and Nfgip we replace every 1 in N¥arn 
by the matrix V{21z and every 0 in N$3i by the null matrix O,,,., . From this it 
follows that the incidence matrix of N is a v;-v_ x b;-b. matrix whose elements 
consist only of 0’s and 1’s, there being r:-rz 1’s in every row and k,-ke 1’s in 
every column of N. This means that the parameters v’, b’, r’, k’ of N given by 


(4.8) y = Uj°U2 5 b’ = bi-be, = Ti°T2, k’ = ky-ke 


have their usual significance for the design N. 

We shall now identify the various associate classes of a treatment of N. Let 
the first row of N$21s correspond to the treatment 6 of N$2is, that of N$2is 
to the treatment 0’ of N{2:1e , and that of N to the treatment @ of N. We shall 
identify the various associate classes of 6 in N. 

The row corresponding to in N$}1 contains 7; 1’s, all other elements in the 
row being 0. In obtaining N each of these r, 1’s is replaced by the matrix Nf21e 
and each of the 0’s is replaced by the null matrix O,,,., . Hence the first v, rows 
of N contain exactly r; replications of the design N$21, as a whole and nothing 
else. Consider one of these r; replications. Its first row, which corresponds to a 
section of @ in N, also corresponds to 0’ in N$21z . Consider the sth associates, 
ig = 1, 2,---, t, of 6’ in N$21e which occur in the replication of N}31s under 
consideration. These are n2;, in number and each of them occurs together with 
0’ in A2;, blocks of the replication of N$21s . When we take into account all the 
r, replications of N}Z1e in the first v2 rows of N, we find that each of the Noi, ioth 
associates of 0’ in N$2r1p , considered as treatments of N, will occur together with 
6 in r1-A2;, blocks of N. We take these n2;, treatments of N to be the ith asso- 
ciates of 6 in N. This identifies the first ¢ associate classes of @ in N and we see 
that the parameters n;,, \;, of N given by 


(4.9) Nie = Nein, Ne = Tziz ig = 1,2,--- ,t, 


have their usual significance for the design N. 

Now consider the i;th associates, i, = 1, 2,--- , s, of © in N$grn . These are 
mi, in number, and each of them occurs together with 9 in \,;, blocks of Nise. 
Consider one of the m;, i:th associates of 6. The row in pS corresponding to 
this 7,th associate also contains 7, 1’s and all other elements in the row are 0’s. 
In obtaining N each of these r, 1’s is replaced by the matrix N{$31, and each 
of the 0’s is replaced by the null matrix O,,,.,. Hence the row in N (1p corre- 
sponding to the 7th associate of © under consideration gives rise to only 7; 
replications in N of the design N$21_ as a whole and nothing else. Out of these 
r, replications of N$2rp only \4;, can be paired off with similar replications of 
N$21= in N arising out of the row corresponding to 6, because 6 occurs together 
with any of its 7:th associates in \;;, blocks of N (1m . Consider one of these Ani, 
pairs of replications of N}grs . The first row in each component replication of 
N$21e in this pair is identical with that corresponding to 6’. One of these first 
rows is a section of that corresponding to @ in N; we define the treatment in N 
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corresponding to the other first row to be a (¢ + 7:)th associate of 6 in N. Since 
@’ is replicated r. times in N$21 , it follows that in the pair of replications of 
N$21s considered above @ and the other treatment, which is defined to be a 
(¢ + %)th associate of @ in N, occur together in r. blocks of N. Taking into ac- 
count the A1;, pairs of replications of N{grg brought to notice earlier, it follows 
that @ and its (¢ + 7,)th associate in N occur together in r2-\x;, blocks of N. Also 
remembering that the number of the i;th associates of © in N}2re is ni, we see 
that the number of the (¢ + 7;)th associates of @ in N is also m;, . This identifies s 
more associate classes of 6 in N and we see that the parameters n:.:, , \:+:, of 
N given by 

(4.10) Ne+é, = Nii, » Ares, ~~ To Ari > 1 = 1, 2, eas ae 


’ 


have their usual significance for the design N. 

Consider again for a moment the above pair of replications of N$2:1z . Consider 
in particular that component N{21z in this pair which contains the (¢ + 7,)th 
associate of @ in N. The first row of this N$21, corresponds to 6’. Consider the 
isth associates of 0’ in this N{$2rp_ . These, considered as treatments of N, are 
defined to be (¢ + 7; + %s)th associates of @ in N. In the replication of N$2rn 
under consideration they are m2;, in number and each of them occurs together 
with @ in A2;, blocks of N. Remembering that there are \;,, such replications of 
N$21s corresponding to each of the n1;, ith associates of © in N}}1p it follows 
that there are 71;,-"2:, (¢ + t%: + %s)th associates of @ in N and each of them 
occurs together with # in A1;,-A2:, blocks of N. This identifies és further associate 
classes of @ in N, and the parameters ni4i,+i,2, ¢+i,+ie of N given by 


/ , 
(4.11) Nt +ij;+ies -_ N1i, * Nei, ? Nt +e, +ige = Ari dBi 


are seen to have their usual significance for the design N. 

Now, since )-j,-1 mi, = 11 — Land )>j,-1 mx, = v2 — 1 (ef. [2]), the number 
of treatments of N accounted for in the above identification of the various asso- 
ciate classes of @ in N is 


; Neaie + D Nii; + 2: > Ni; *Ni, = {1 + > may + 2. mas} =— 1 


$4=0 es | iy=l ig=l 


io= 
= vv, — |, 


which together with @ exhausts all the ,-v2 treatments of N. 

It may also be observed that if 1 S 4 S sand1 < i &S ¢, then any integer 
m such that t + s < m S t+ 8 + és can be uniquely written in the form m = 
t + i + ies, that is, if m = t + 4 + ws and if m = t + 4% + %-s, then we 
must have 7; = 7;, and 7, = 7%. This fact ensures the uniqueness of the enumera- 
tion of the associate classes of @ in N, as described above. 

This proves that the design N has at most ¢ + s + ¢s associate classes. 

We now calculate the parameters p,z, 2, y,z = 1,2,-:-,¢+ s+ ts, of 
the design N. 
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Consider the treatment 6 of N which corresponds to the first row of N. Let 
¢ be an izth associate of in N, i = 1, 2,---, ¢t. Then clearly the row in N 
corresponding to ¢ is contained in the first v. rows of N and these v, rows of N 
contain among themselves exactly r; replications of N$,1» asa whole and nothing 
else. The first row in any one of these replications of N}g1, , which is a section of 
that corresponding to @ in N, corresponds to the treatment 0’ of N}gis . This 
replication of N (2 1p also contains a row which is a section of the row correspond- 
ing to ¢ in N, and this row of N{ ars corresponds to the treatment ’ of N} ie. 
Clearly 6’ and ®’ are i:th associates of each other in Narn . Now there are p23,x, 
treatments of N21 which are in common with the jsth associates of 6’ and the 
kth associates of ®’ in N$ 21m, for j2, ke = 1, 2, ---, t. It is clear that exactly 
these Daleks treatments in the replication of N (orp under consideration, considered 
as treatments of N, are those which are in common with the jeth associates of 6 
and the kth associates of ¢ in N. Hence 


(4.12) Dirks _ Pai aks ’ tg »J2 ’ ke _ 1, 2, _.-* t. 


Also observe that the first v, rows of N contain all the first ¢ associate classes of 
6, and also of ¢, and only these. Hence we must have 


(4.18) pit =0; t2,j2=1,2,-+-,thu=t+1t+2,---,t+s8+ 8. 


Next consider the (¢ + j:)th associates of 6 in N. They are the treatments of 
N corresponding to those rows in N which correspond to 6’ in the replications 


of N¢2re arising from each of the j,th associates of 6 in N}Z1z . Similarly the 
(t + k,)th associates of ¢ in N are the treatments of N which correspond to those 
rows in N which correspond to & in the replications of N{$21s arising from each 
of the k:th associates of 6 in N$}1z . Since the treatments 0’ and ®’ are distinct 
we must have 


(4.14) Pitii.th, = 0, ja ; ky = 1. 2, PP. %% i. 


Again the (t + k, + k2-s)th associates of ¢ in N are the k.th associates of 
®’ in the replications of Nba arising out of the k,th associates of © in N$erp . 
To calculate the value of p;43;,,144,44.2 We have to count the number of treat- 
ments of N which are in common with the n,;,  (¢ + j:)th associates of 6 and the 
Mk,*Nx, (t + hk + kes)th associates of ¢ in N. It is clear, from the way in 
which these associate classes are defined, that 


Ditton = 0 if jx ki, 
Pitirtritee =O if t * ke, 
Didnt intiee = Uj, - 
It may be easily seen that the above relations can be written in the form 
Divi stpeytkas = Sighs’ May, * Sk, jak: = 1,2,---,8;ke = 1,2,--- ,¢, 


and since the indices j; , k; have to run over their entire ranges before the index 
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k, can change its value, we can write the above equation in the matrix form 
(4.15) (DeFir.ttettes) = (Bigts) X (115,55.e,)- 
Finally let us consider the (¢ + j: + je-s)th associates of 6 in N and the 
(t + ky + ke-s)th 


associates of ¢ in N. The number of treatments of N in common with these two 
associate classes is Ditii+ je-s,t+k,+k_-s - From the definitions of these two associate 
classes we find that 


his hn baits ; 

Pt+iities.t+ki+kes 0 if nF ky ’ 
ie efi te 

Pi+ipties.t+iitkes = Mj D2rjoke - 


These relations are easily seen to be equivalent to writing 


lie - te ° iis ‘ a ‘ _ 
Pets :+igs.t+rkytkee = D2joke*M1j,9j,e,, Jr, hi = 1,2, --- , 83 je, ke = 1,2,---, 2, 


and since here also the indices j; , k; have to run over their entire ranges before 
the indices jz , k, can change their values, it follows that we can write the above 
equations in the matrix form 


(4.16) (Didtis+ine.t+bs-+tee) = (p23 a2) x (15,5 j,4,)- 


Combining the calculation in (4.12) to (4.16) and remembering that p,!? = 
D:,’, ig = 1,2,---,t; y,z2 = 1,2,---,¢ + 8+ ts, we get the first of the three 
matrices in (4.7). 

Similar calculations will give the other two matrices in (4.7). 

Thus the argument so far together with the results in (4.8) to (4.16) prove 
the statements (a) and (c) of Theorem 4.2. 

Also from the way in which we have defined the various associate classes in N, 
we find that if the s associate classes of N$'315 and the ¢ associate classes of N$ 21 
are all distinct, then the first ¢ associate classes of N are all distinct, the next s 
associate classes of N are all distinct, and the last ts associate classes of N are 
all distinct. Further suppose that A1;, = 7: for some 7,, 1 S 7% S s. Then from 
(4.9) and (4.11) we find that Aj, = Ai+i+ige , ta fixed; ig = 1, 2, --- , ; hence it 
may be possible to combine some of the corresponding associate classes. Simi- 
larly if \2, = re for some 7., 1 S % S ¢, then from (4.10) and (4.11) we find 
that Mose = Neatetiters ta fixed; 7, = 1, 2,--+-, 8; hence it may be possible to 
combine some of the corresponding associate classes. But if \1;, <r; and Ax, < 
ro for all 74; = 1, 2, --- , sand % = 1, 2, ---, ¢, no such situation can arise and 
then the first ¢ associate classes and the next s associate classes are distinct from 
the last ts associate classes of N. Finally if ri-Ae:, # redri, for all 7; = 1,2,---,8 
and i, = 1, 2, --- , é¢, then the first ¢ associate classes are distinct from the next 
s associate classes of N because of (4.9) and (4.10). This means that if the con- 
ditions in the statement (b) of Theorem 4.2 are satisfied, then the ¢ + s + ts 
associate classes of N are all distinct. This proves the statement (b) of Theorem 
4.2. 
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Lastly the statement (d) of Theorem 4.2 is simply a provision for the cases 
to which the statement (b) of Theorem 4.2 does not apply. 

This completes the proof of Theorem 4.2. 

Although the following corollary is an obvious special case of Theorem 4.2 
we state it separately because we shall require it for further investigation. 

CoroLLARY 4.2.1. (a) The Kronecker product N = Nasi X Ns of the 
two BIB designs Nwsip and N.)p1p defined by the respective sets of parameters 


(4.17) vr, bi, T1, ki, Mt» 
(4.18) v2, b, 72, ka, A 


is a PBIB design N with at most three associate classes. 
(b) The three associate classes of the design N defined above are all distinct if 
* + * * 
T1 “de a T2 “AL. 
(c) In any case the parameters of the design N can be expressed in terms of those 
of Napre and Ne)pie by the equations 


*~ * * * * * * *” 
v’ "V2, b’ = by -bo, r= 1°12, k’ = ky ‘ke, 


(4.19) m=7r-1, m= —1, ns = WF — Ir — Dd, 


M3, As =recAr, As = AT-AZ, 


vt — 1 (eT — 1)(vr — 2) 
0 0 a | 
(pyz) = 0 vi — 2 0 
v — 1 0 (vt — 2)(07 — 1) 


0 1 o-2 
(py) = 1 0 vl 
vw —2 wn —2 (wr — 2) — 2) 


-—2 


where y, z = 1, 2, 3. 

(d) Lemma 4.1 can be applied to the cases in which the condition in (b) above is 
not fulfilled. 

We shall now obtain the conditions under which the Kronecker product N 
of two BIB designs Nasr and N srs, defined in Corollary 4.2.1.(a), is a 
PBIB design with only two distinct associate classes. 

Since AF < rf and dF < 77, it is clear that the first necessary condition is that 
r*\¥ = r¥aT. In this case applying Lemma 4.1 to the first two matrices in (4.19) 
we find that the second necessary condition is that v} = v:. It is clear from 
Lemma 4.1 that these conditions are also sufficient for the design (4.19) to have 
only two distinct associate classes. 
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If the conditions vf = vz and rf-Al = r2-Xt are satisfied, then from the rela- 
tions among the parameters of BIB designs (cf. [11]), we must have kf = kt. 
Conversely, if we assume that vf = vs and kf = ki, then we can deduce that 
ri-A\3. = r2-AT. This means that the conditions 
(4.20) n=, k=k 
are equivalent to the conditions 


(4.21) =v, TA = 7re-Al, 


and hence either (4.20) or (4.21) are necessary and sufficient conditions for the 
design (4.19) to have only two distinct associate classes. Under (4.20) or (4.21) 
we can further deduce that 


(4.22) br nr 
a Ae 
These results are stated in the following corollary. 
CorRoLuARY 4.2.2. The necessary and sufficient conditions for the Kronecker 


product N= Ns x N18 of the BIB designs Nwsis and Ns with respec- 
tive sets of parameters 


o * * 7* n* 
V1, 91,71, %1,A1, 


* * ” + * 
ve, be, T2, ke, Ae, 
to have only two distinct associate classes are 
* 


* * * 
vi = =v, say, ki = kz =k, say. 


If these conditions are satisfied then we have b3/bt = r2/ri = \3/M = yu, say, 
where u is a positive fraction, and in this case the parameters of N are expressed in 
terms of those of Nayar and N «rp by the equations 


vi=v, WF =y-(bt), r=e(t), k=, 


m=2v-1), m=(—1)*, M=wrtea, mM = H(t)’, 


(p) ah v—1 (p? -( 2 preood 
~ eae oegeeee Oe leo ex Oat: 


The following definition of a cyclic design is given by Bose and Shimamoto 


(4.23) 


[9] 


Consider a PBIB design N}21_ with two associate classes and with parameters 
v, b, r, k, ni, Xs, Die, i,j, k = 1, 2. Let its treatments be designated by the 
integers 1, 2, --- , v. The design Nf ars is said to be a C (cyclic) design if the first 
associates of the treatment 7 of N&aiz regarded as a PBIB design are the treat- 
ments 


i+d3,, i+d.,-::,it+dn, mod v 
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where the d’s satisfy the conditions: 

(i) the d@’s are all different and 0 < d; < vforj = 1,2, ---,m; 

(ii) among the n(n, — 1) differences d; — dy, j,j’ = 1, 2,---,m3;j #7; 
reduced mod v each of the numbers d, , dz, --- , d,, occurs a times whereas each 
of the numbers ¢; , é2 , --~ , €n, occurs 8 times where d; dz, --- , dn, , 1, 2 
€n, are all the different vy — 1 numbers 1, 2, ---,v — 1. 

Clearly it is necessary that 


(4.24) nya + mB = n(m — 1). 


g 24s 


The parameters pj,, 7, 7, k = 1, 2, of N¥2re are in this case given by 


; a m—-a-—l1 
(pix) = ’ 
m—-a-l m%- +a+t+l 


; 8 ™m — Bp 
(pix) = . 
m—-B m—-m+68+1 


If we take a = v — 2 and 8 = 2, then we find that the necessary conditions 
(4.24) and (4.25) are satisfied by the corresponding parameters in (4.23). Let 
the treatments of the design N of (4.23) be designated by integers 1, 2, --- , v’. 
Then, according to the method of identification of the various associate classes 
in N described in the proof of Theorem 4.2, the first associates of the treatment 
1 in N are treatments 2, 3, ---,v,v + 1,20 + 1,---,v° — v + L. The corre- 
sponding d’s are clearly 1, 2,--- ,v — 1, v, 2v,---, v’ — v. If we form the 


2(v — 1)(2v — 3) 


(4.25) 


differences d; — d;, j,j’ = 1,2,---,2(v — 1);j ¥ j’; of these 2(v — 1) d’s, 
it is obvious that d; = 1 will occur in these differences exactly v — 1 times 
whereas if the design N of (4.23) were a cyclic design, d, = 1 must occur only 
a = v — 2 times. Hence we find that the design N of (4.23) cannot be a cyclic 
design even though its parameters satisfy the necessary conditions (4.24) and 
(4.25). 


5. Construction of certain PBIB designs. From the results of Section 4 we 
find that two Kronecker products which give PBIB designs with two associate 
classes are 

(i) No X Ngrsp (Corollary 4.1.1), 

(ii) Naar X Nes where Nasi and Ns are two BIB designs for 
which vt = v2 and kf = kt (Corollary 4.2.2). 

Bose, Shrikhande, and Bhattacharya [13] have obtained certain singular GD 
designs by applying Corollary 4.1.1, which is the only way of getting them. 
The following example illustrates Corollary 4.2.2. 

EXAmpLe 5.1. Let us take Nyg1p to be the BIB design defined by the param- 
eters 


(5.1) ) = k* = 3, A\* = 2 
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(ef. Cochran and Cox [6]). Let NV sre be the same as N «p71, 8o that the value of 
uw in Corollary 4.2.2 is 1. Then clearly the parameters v’, b’, r’, k’ of 


N = Nests x N s1B 
are 


(5.2) y = bj’ = 16, r =i’ = 9, 


Let the treatments of N be designated by the integers 1, 2, --- , 16. The proof 
of Theorem 4.2 contains a description of the method of identifying the various 
associate classes of N. According to this method we get the following identifica- 
tions. 


Treatment... paw he 1 | 2 | 
—| 





First associates. ...| 2, 3, 4, 5, 9, 13. 1, 3, 4, 6, 10, 14. lo, 5, 7, 8, 10, 14. 


Second associates...| 6, 7, 8, 10, 11, 12, 14, | 5, 7, 8, 9, 11, 12, 13, | 1,3,4,9, 11, 12, 13, 15, 
| 15, 16. | 15, 16. | 16. 





It is clear that the parameters n; , m2, 1 , \2 of N are given by 
(5.3) 1 = 6 m=9 Mm=6, A =4. 


Further the comparisons of the associate classes of treatment 1 with those of 
treatments 2 and 6 respectively lead to 


(5.4) ( ( : (ji) ( , 
” Pit) = 3 ) le ie J 


where p;i, i,j,k = 1, 2, are parameters of N. It may be noted that the design 
N is not cyclic even though the necessary conditions (4.24) and (4.25) are satis- 
fied by its parameters. 

The equations (5.2) to (5.4) give all the parameters of the design NV. The blocks 
of the design N are shown below. 


(1, 2,3, 5 
(1, 2, 4, 
(1, 3, 4, 
(2, 3, 4, 6 
3 
4 


, 6, 7, 9, 10, 11), , 6, 7, 13, 14, 15), 
5, 6, 8, 9, 10, 12), 6, 8, 13, 14, 16), 
5, 7, 8, 9, 11, 12), z 8, 13, 15, 16), 
6, 7, 8, 10, 11, 12), 7, 8, 14, 15, 16), 
(1, 2, 3, 9, 10, 11, 13, 14, 15), 5, ¢ 10, 11, 8, 14, 18), 
(1, 2, 4, 9, 10, 12, 13, 14, 16), (5, 6, 8, 9, 10, 12, 13, 14, 16), 
(1, 3, 4, 9, 11, 12, 13, 15, 16), (5, 7, 8, 9, 11, 12, 13, 15, 16), 
(2, 3, 4, 10, 11, 12, 14, 15, 16), (6, 7, 8, 10, 11, 12, 14, 15, 16). 


From the remaining results of Section 4 we find that the following Kronecker 
products give PBIB designs with more than two associate classes. 
(i) No X N$2:n = (Theorem 4.1), 
Gi) N¥oin X No oes (Theorem 4.2), 
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(iii) Nasr x Np (Corollary 4.2.1). 
The following example illustrates Corollary 4.2.1. 
EXAMPLE 2. Let Nipis and N prep be the BIB designs defined by the sets of 
parameters 
(5.5) 1 = bt =3 
* 


(5.6) 2 = : 3s = : = 


respectively (cf. Cochran and Cox [6]). Then clearly the parameters v’, b’, r’, k’ 
of N = Nw)s18 po Ness are given by 


(5.7) 15, W =30, r=8 k =4. 


Let the treatments of N be designated by integers 1, 2, --- , 15. According 
to the method of identifying the various associate classes of N described in the 
proof of Theorem 4.2, we get the following identifications. 





Treatment 





First associates 13,4 : 6, 8, 9, 10. 


Second associates. ...| 6, 11. , ; 2, 12. 
| 


7, 8, 9, 10, 12, | 6, 8, 9, 10, 11, | 2, 3, 4, 5, 12,| 1, 3, 4, 5, 11 
13,14,15. | 13,14, 15. 13,14,15. | 13, 14, 15. 


Third associates... 


’ 


Also it is clear that the parameters nj , m2, 73, 41, A, A; of N are given by 
(58) m=4, m=2, m=8 M=4, M=2, M=L 

Further, the comparisons of the associate classes of treatment 1 with those of 
treatments 2, 6, and 7 respectively lead to 


3 0 0 00 4 3 2 
(5.9) (pi) =(0 0 2), (pe) =(01 0), G@P=l1 0 1 
0 2 6 40 4 ie im 


where p;., i,j,k = 1, 2, 3, are parameters of N. 
The equations (5.7) to (5.9) give all the parameters of the design N. The 
blocks of the design are as shown below. 


(1, 2,6,7), (1, 3, 6, 8), (1,4,6,9), (1, 5, 6, 10), 2, 3, 7, 8), 
(2,4,7,9),  (2,5,7,10), (3,4,8,9), (3,5, 8, 10), 4, 5, 9, 10), 
(1, 2,11, 12), (1,3, 11,13), (1,4, 11,14), (1,5, 11, 15), , 12, 13), 
2,4, 12,14), (2,5,12,15), (3,4, 13, 14), (3,5, 13, 15), 5, 14, 15), 
6, 7, 11, 12), (6,8, 11,13), (6,9, 11,14), (6, 10, 11,15), (7,8, 12, 13), 
7,9, 12, 14), (7, 10, 12,15), (8,9, 13,14), (8, 10, 13,15), (9, 10, 14, 15). 
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EXamp.Le 5.3. Consider the PBIB design N with three associate classes and 
with parameters: 


v =b= pq, r k=pt+q-1, 
(5.10) m=p-1l, m=q-1, m=(p—-1)q- 1), 


M=D Me =4Q, 
p-—2 0 0 
0 0 q-1 
. g¢~-1l &-2e- 2 
p-il 
0 
(p — 1)(q — 2) 
0 1 p—2 
(pie) =] 1 0 q-2 
p—-2 q-2 (p—2)q — 2) 


where p, g are positive integers 2 2 and j, k = 1, 2, 3. 


This design taken from Bose and Nair [2] very much resembles the Kronecker 
product of two BIB designs. Let us suppose that the above design is the Kro- 
necker product of the two BIB designs defined by the sets of parameters 

fg a 
and 


* * * ~ + 
ve = D, be, Te, ke, Ae. 


From Corollary 4.2.1 it follows that we must have 
M=rAs =P, A= =G, As=AI“Ad 
Hence 
pq = Aide = ATARTIT2 = 2rirz = WirkT = 2Ap+q— 1), 
which leads to 
(5.11) (p — 2)(q — 2) = 2. 


This means that a necessary condition for the design N of (5.10) to be the 
Kronecker product of two BIB designs is (5.11). 

Also since p and g are positive integers, we must have from (5.11) either p = 
3 and q = 4 or p = 4 and g = 3. It is enough to consider one case, say, p = 4, 
q = 3. With these values it is clear that (5.11) is satisfied. The corresponding 
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BIB designs are defined by the sets of parameters 


* * * * ‘ 
vy, = db} = 3, fe = kz = 2, 


and 
n=-b =4, nr=k=3, rs =2. 


Now applying Corollary 4.2.1 to these two BIB designs it is easily verified 
that their Kronecker product has the parameters of design N of (5.10) for which 
p = 4 and q = 3. Thus we find that the condition (5.11) is also sufficient for the 
design N of (5.10) to be constructible as the Kronecker product of two BIB de- 
signs. 

It has been remarked by Bose and Nair [2] that the design N of (5.10) with 
three associate classes reduces to a PBIB design with two associate classes if 
p = 2 or q = 2. Because of Lemma 4.1 we can further add that the design N 
with three associate classes reduces to a PBIB design with two associate classes 
ifp = q. 

The method of taking Kronecker product of designs has been used to prove 
the impossibility of a certain class of PBIB designs and to analyse some other 
class of designs. It is hoped to publish at a later date some results in this direction. 

I wish to express my sincere thanks to Professor M. C. Chakrabarti under 
whose guidance this work was carried out. 
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University of North Carolina and College of Science, Nagpur, India 


Summary. The problem is considered of obtaining bounds for the (cumu- 
lative) distribution function of the sum of n independent, identically distributed 
random variables with k prescribed moments and given range. For n = 2 it is 
shown that the best bounds are attained or arbitrarily closely approached with 
discrete random variables which take on at most 2k + 2 values. For nonnega- 
tive random variables with given mean, explicit bounds are obtained when n = 
2; for arbitrary values of n, bounds are given which are asymptotically best in 
the “‘tail’’ of the distribution. Some of the results contribute to the more general 
problem of obtaining bounds for the expected value of a given function of in- 
dependent, identically distributed random variables when the expected values of 
certain functions of the individual variables are given. Although the results are 
modest in scope, the authors hope that this paper will draw attention to a 
problem of both mathematical and statistical interest. 


1. Introduction. This paper considers part of the following general problem. 
Let ® be the class of all dfs (distribution functions) F(z) on the real line 
which satisfy the conditions 


| x <A, 
(ros > B; 


[ 90) aP@) =, t=1,-+-,k FG) = 


where the functions g;(z), --- , g.(x) and the constants c,, --- , c , A, and B 
are given. We allow that A = — = and/or B = o~. Here and in what follows, 
when the domain of integration is not indicated, the integral extends over the 
entire range of the variables involved. 

Let K(2, --+ , Zn) be a function such that 


vr) = f +++ | K(ei,-++, 24) dP(a) «++ dF.) 


exists for all F in D in the sense that the multiple integral is equal to the re- 
peated integral taken in an arbitrary order. The problem is to determine upper 
and lower bounds for ¥(F) when F is in D. 

Forn = 1, g(x) = x‘, and K(x) = 1 or 0 according as x < ¢ or > #, as well 
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as for other functions K(x), an extensive literature on the subject exists; for 
some references see [4]. 

For n arbitrary, Robbins [6] showed that the Bienaymé-Tchebycheff bound 
for Pr (|X: + --- + X,| 2 #, where the X; are independent and identically 
distributed with zero mean and given variance, can be improved when n > 1. 
Plackett [5], Gumbel [2], and Hartley and David [3] obtained the best possible 
bounds for the expected sample range and the expected value of the largest 
observation, in the case when the mean and the variance are given, assuming 
that the common df is continuous. In a problem analogous to the general problem 
stated above, but without the assumption that the n variables are identically 
distributed, one of the authors [4] showed that under general conditions the 
best bounds are attained or arbitrarily closely approached with step-functions 
in D which have at most k + 1 steps. 

The present paper concentrates attention on the case where K = 1 or 0 
according as a given function f(z, --- , z,) is or is not contained in a given 
set. The method used permits one to obtain the closest bounds only for n = 2. 
If n is even, f = 21 + --- + 2,, and g(x) = zx‘, the bounds for n = 2 can be 
applied in an obvious way, but in general will not be the best ones. More general 
functions K are considered only insofar as they can be handled by the same 
method. 

Theorem 2.1 states conditions under which we need consider only step-func- 
tions in D. Theorems 2.2 and 2.3 show that for functions K(z, y) of a certain 
type we may restrict our attention to step-functions with a bounded number of 
steps. In Theorem 3.1 an explicit expression for the least upper bound of 
Pr (X + Y 2 2) is obtained when X and Y are nonnegative, independent, and 
identically distributed with given mean. In Section 4 bounds for the analogous 
case with n summands are considered. 


2. The least upper bound of {{K(x, y) dF (x) dF(y). Let K(z, y) be a function 
such that 


(2.1) vr) = ff K(e, y) aF@) arly) 


exists for all F in 9, in the sense that the double integral equals the repeated 
integral. The problem is to determine the least upper bound of ¥(F) for all 
F in ®. 

Let D* be the class of all F in D which are step-functions with a finite number 
of steps. The following theorem shows that if D is the class of dfs with k pre- 
scribed moments and given range, and ¥(F) is the probability that two inde- 
pendent observations on a random variable with df F fall into a set of a rather 
general type, we may confine our attention to dfs in D*. 

THEOREM 2.1. Let g(x) = x™*, where m, --- , m, are positive integers. Let 
K(x, y) = 1 or 0 according as (x, y) is or is not contained in a Borel set S such 
that the sets {x: (x, y) € S, y fixed} and {y: (x, y) € S, x fixed} are unions of a 
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jinite and bounded number of intervals (which may be infinite). Then 
sup ¥(F) = sup y(F). 
FeD FeD* 


The theorem follows immediately from an obvious analog of Lemma 2.1 in [4] 
and Lemma 3.1 and Theorem 4.1 of [4]. 


It can be seen from [4] that the reduction to distributions in D* is possible 
under more general conditions. 

We shall now derive sufficient conditions under which, given a step-function 
F in D with m steps, we can construct a step-function G in D with less than m 
steps such that ¥(@) 2 Y(F). 


A step-function F in D with exactly m steps is of the form 


(2.2) F(z) = P; if @; Sz < Gji, j=0,1,---,m 


where 


(2.4) 0= Py < Pi <-e: < Pri < Pm = 1; 


(2.3) —-SD=HhlAKa sss San < Ony = %, Asa, Am 


m—1 
(2.5) x his P; = c: — gilQm), 
an 


(2.6) his = gila;) — gi(aj4,), 
Let 
(2.7) G(x) = P; + tD; if CS; Ss < Gina, j =0,---,m. 


In order that G(x) be a df in ® it is sufficient that the numbers ¢ and D; satisfy 
the conditions 


(2.8) 0; 
(2.9) 05Pi14+tD SP2+tD S--: S PaittDar 31; 
m—1 
(2.10) } hi;D; = 9, 
j=l 
If F and G are defined by (2.2) and (2.7), we have 


m—l1 m—1 m—1 
(2.11) WG) — WF) =t d L,D,+¢ DX 2» Li Di D;, 


tml j= 
where, with K;; = K(a;, a;), 
(2.12) L; = a (Ki; + Kyi — Kijas — Kisss)(Pi — Pix), 


(2.13) Lis = Ki; — Kiary — Kinja + Kis jnr- 


Lemma 2.1. Let F be a step-function in D with exactly m steps, defined by (2.2) 
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to (2.6), where m > k + 1. Suppose that the integers wu; , --- , Ux, can be so chosen 
that 

lSm< mss Cum Sm-l 
and the equations 


k+1 


(2.14) Dd hin, tr = 0, 


r= 
imply 


k+1 k+1 


(2.15) D> D Lau. tr%e = O. 
r=1 s=1 
Then there exists a step-function G in D with less than m steps, for which 
¥(G) 2 ¥(F). 
Proor. Let G(x) be defined by (2.7), and let D; = 0 for 7 # wm, +--+, Wea. 
Let \ = 1 or 0 according as the rank of the matrix 
| ig | 
|| Peres ane ie 


| ma, ws 
is equal to or less than k + 1. Then the equations (2.14) and 


k+1 

> | i ae 

r=1 
have a solution (D,,, --- , Du,,) ¥ (0, --- , 0). Having thus fixed the D; , 
let ¢ be the largest number which satisfies the inequalities (2.9). This number 
exists and is positive. With this choice of the numbers ¢ and D;, G is a step- 
function in D with less than m steps. Furthermore, by (2.11), 


\ 
Uk+1 |! 


k+1 k+1 


WG) — WF) =ac+e L » Lu,u,Du,D., = 0. 
The proof is complete. 

The next theorem shows that if K(x, y) is of a certain form, and if we restrict 
ourselves to the class D* of step-functions in D with a finite number of steps, 
we need consider only step-functions with a bounded number of steps. 

Let ®,, be the class of all F in D which are step-functions with at most m 
steps. 

THEOREM 2.2. Suppose that K(x, y) is of the form 


K(z,y) = 0 2 ausglx)g(y) if bua S f(x,y) <b, 


t=) j=! 


where go(x) = 1, the a;:; are arbitrary constants, the b, satisfy 
— © = bh < bi < +100 <b <b, = oe, 
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and f(x, y) ts a strictly increasing function in each of its arguments when the other 
argument is fixed. Then 

sup (F) = Fsup y(F). 

FeD* FeDskis 

The theorem remains true if in the inequalities bi. S f(x, y) < b: some signs S 
are replaced by < or vice versa, provided that the s sets defined by the inequalities 
cover the entire plane. 

Proor. Let F(x), as defined by (2.2) to (2.6), be an arbitrary step-function 
in D with exactly m steps, where m > sk + s. It is sufficient to construct a step- 
function G in D with less than m steps such that ¥(G@) = y(F). Let m,, for 
t = 1, --- , s, denote the number of indices u, with 1 < u < m, for which 


bis S (au ’ a.) <b. 


Then s max (m,) = (m, + --- + m,) = m > s(k + 1). Hence there exists a 
t for which m, 2 k + 2 and an integer n such that 


ber S flan, Gn) < f(Gn4e41, Onyeyi) < de. 
The assumption about f(z, y) implies that 


k 


k 
Kw =  ® » Gujgilag (aw) nSvwsin+k+1. 


t=0 j=0 


By (2.13) and (2.6) this implies 


k k 
Luo = » 2» Ori; Rin Rjw nsvwsntk. 
band fom 
Hence if we let up = n + r — 1 forr = 1, 2, --- , k + 1, the conditions of 
Lemma 2.1 are satisfied. The proof is complete. 

If g(x) = 2‘, that is, if D is the class of distributions with given moments 
up to order k and given range, the assumption of Theorem 2.2 means that 
K(x, y) is piecewise polynomial, of bounded degrees, in sections of the plane 
separated by curves of negative slope. If K(x, y) is piecewise polynomial in 
sections separated by curves of positive slope, a similar reduction of the problem 
to the case of step-functions with a bounded number of steps is in general im- 


possible. For example, let K(x, y) = max (z, y), and let D be the class of dfs 
F with 


[ 2 ar@) = 0, [z# dF(x) = 1. 


Under the restriction to continuous functions F(z), this is a special case of a 


problem considered by Hartley and David [3] and Gumbel [2]. For an arbitrary 
df F(x) we can write 


WF) = 2] 2F(2) F(z), P(e) = HF(x — 0) + F(@ + OD). 
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Using Schwarz’s inequality, we have for any constant c and any F in D 
UP) +o=2 [+ OP) ara) s2(a+e) [Pa are)) . 

If F(x) is continuous, fF(x)* dF(x) = 4, and the bound 


v(F) S min, {2-37(1 + ¢’)' — c} 


is attained with a continuous df in D, as shown by Hartley and David. 
Now let F(x) be a step-function with at most m steps which takes on the 
values0 = Py) S Pi S -:-- S Pai S Pm = 1. Then 


4 [ P(2)* av(z) = > (P,1 + PUP; — Py). 


This can be written 
12 | P(e) aF() = 4 — Dp, pi = Pj — Pj. 
j=l 
The conditions }> p; = 1 and p; = 0 imply >- p? = m”. Hence 


| Fe dF(x) S 4 — 1/12m’, 


and the Hartley-David bound cannot be approached arbitrarily closely with 
a step-function in D having a bounded number of steps. 

Combining Theorems 2.1 and 2.2 we can state that if the conditions of both 
theorems are satisfied, then 


sup WF) = sup YF). 
FeD PeD.k+s 


In particular, the conditions of Theorem 2.2 are fulfilled if y(F) = 
Py{f(X, Y) 2 c}, or = Pr{ \f(X, Y)| = c}, ete., where Pp{---} is the proba- 
bility of the event in braces when X and Y are independent with common df F, 
and f(z, y) has the property stated in the theorem. Using Theorem 2.1, we 
obtain: 

THEOREM 2.3. Let D be the class of dfs F(x) which satisfy the conditions 


(0 «<A, 
[ 2 aF@) = &:, ¢=1,---,k, F(z) = 4 
\l => B, 
with given integers m, , «++ , mM, and given numbers c , -+- , Cx, A, B, where we may 
have A = —@ and/or B = ~. Let f(x, y) be a strictly increasing function in 
each of its arguments when the other argument is fixed. Then 
sup Pr{f(X, Y) =c} = sup Pr{f(X, Y) = c}. 
FeD FeDor+2 
3. The least upper bound of P(X + Y = t) when X and Y are nonnegative, 
independent, and identically distributed with given mean. As an application of 
the results of Section 2 we shall prove the following theorem. 
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THEOREM 3.1. Let X and Y be two independent random variables with common 
edf F(x). Let D be the class of dfs F with F(x) = 0 for x < 0 and fx dF(z) 
where u» > 0. Then 

(1, 


(3.1) sup PAX + Y 2 cy) = {4/¢, 
\2/e — I/e’, 
The three bounds are attained with the respective distributions 
P(X = uw) = 1; 
P(X = 0) = 1 — 2/e, P(X = 4cu) = 2/c; 
P(X = 0) = 1- I/e, P(X = ch) = 1/c. 


Theorem 3.1 should be compared with the solution by Birnbaum, Raymond, 
and Zuckerman [1] of the analogous problem without the restriction that X 
and Y be identically distributed. If M(c) denotes the least upper bound of 
P(X + Y 2 cu) when X and Y are nonnegative, independent, and have the 
common mean yp, we have by [1] 


i é = 2; 
(3.2) M(c) = 41/(e — 1), 25c¢8 43+ V5); 
2/e — 1/c’, 13+ 7/5) Se. 


Hence the bound (3.1) is smaller than the Birnbaum-Raymond-Zuckerman 
bound if and only if 2 < c < 33+ +75). 

Proor OF THEOREM 3.1. We may and shall assume that 1» = 1. By Theorem 
2.3 we need consider only dfs F in D which are step-functions with m s 4 
steps. Then F is of the form (2.2) to (2.4), where A = 0 and B = o, and 
D1a,(P; — Ps) = 1. We have 


fl, e+y2c; 
KGa) = ea ee 


0, tr+y<e. 
Hence the numbers K;; = K(a;, a;) satisfy the conditions 
K;; = Oor1; Ki; = Kx; Ky; S Ke; f t<f#. 

The sequence (Ky, Ke, «++ , Kmm) consists of a sequence of zeros followed 
by a sequency of ones. The reasoning used in the proof of Theorem 2.2 shows 
that any distribution for which there are more than two consecutive zeros or 
more than two consecutive ones in this sequence can be replaced by a distribu- 
tion with less than m steps which does not decrease the value of ¥(F). 

Hence for m = 4 we need consider only matrices ||K;;|| of the four types 

lo 0 Oo -| ] 0 0! 10 0 
10 0 0 | 11); |0 0 
001 1)? aap? Vor 
PS te . a6 ae | 
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where the numbers represented by dots need not be specified. The corresponding 
matrices ||;;|| are 


I II III IV 
Me O58. JO 8. | O24] | 0 0| 


| 


weet, 1t 1.08 - Ste 7 ele fe 1. of: 
ae ee ae le ee en 0 0| 


We shall apply Lemma 2.1 to show that in every case there exists a df in D 
with at most three steps which does not decrease the value of ¥(F). It is sufficient 
to find integers u and v with 1 < u < v S 3 such that the equation 


(3.3) (@y — Qu4i)z + (Q, — Qr4i)y = 0 
implies 
(3.4) Lut? + 2Lwexy + Ley’ = 0. 


Inequality (3.4) is satisfied in Case I with u = 1 and v = 2, and in Cases II 
and IV with w = 1 and v = 3. In Case III, when u = 1 and vp = 3, the left 
side of (3.4) is —2zry, which is nonnegative by (3.3), since a; — aj; < 0. 

Hence we may confine our attention to step-functions in D with m <S 3 


|0 oO 1) 
0 0 1) 


}1 1 


The corresponding matrices ||L;,;|| are 
A B C | 
jo of jo 1) jo of a of 1] ||-1 0| 
Ho ai? fia —a]) 0 -1\" | I-1 ol” =| 0 Of: 


In applying Lemma 2.1 we have to take u = 1 and v = 2 and show that 
(3.3) implies (3.4). This is true for the matrices A, D, and E. In Cases B, C, 
and F, Lemma 2.1 is not applicable. 

In Case C, ¥(F) = 1 — P}. If G(x) is defined by (2.7) with m = 3, we have 


¥(G) — ¥(F) = —t D,[2(P: + t D,) — t Di), 
where ¢ and D, satisfy the conditions 
(3.5) (a; — a2) Dy + (a, — a3) Dz = 0, 
(3.6) 02£P+tDsS P2+tD, € 1. 
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Let D, = —1. Then D, is given by (3.5). Let ¢ be the largest number which 
satisfies (3.6). Then? > 0, Gis in D, and y¥(G) — ¥(F) 2 0. 

In case F, ¥(F) = 1 — Pj, and similar reasoning shows that this case 
also can be reduced to a step-function with at most two steps. 

The only remaining case with m = 3 is Case B. Here we can write ¥(F) = 
2peps + ps, where (admitting the possibility that F has less than three steps) 


(3.7) Pi + po + ps = 1, Gp: + Gop, + asp; = 1; 
(3.8) nA20 p20 pm2d; 


(3.9) OSaqgS%58 4;; 


(3.10) a + a; <¢, 2ae < ¢, cSia+a;. 


Expressing ¥(F’) in terms of a; , d2, @3, pi, we get 


WF) =(1—p)*-p, m= dy — 1 — (a3 — ai)~ 
a3 — de 
If a, , a3, and p,; are held fixed, ¥(F) is a decreasing function of a; . Hence we 
maximize ¥(F) by choosing the least possible value for a; . This is the greatest 
of the bounds given by the inequalities p; 2 0 and a; 2 0. If this bound is 
given by one of the equations p; = 0, we get a distribution in D,. Hence we 
may assume that the least value is a; = 0, so 


e.- 1- a3 Pi 


» = 
P a3 — de 


and ¥(F) is a decreasing function of a, when a, and p,; are fixed. The only lower 
bound for a, which does not necessarily correspond to a distribution in D, is 
a. = c — a;. In this case 


- = gee i Pe 
2a; — ¢ 2 


c(1 — pri) —2 


+ “3a, — 0 


Pe 

This is a monotonic function of a; (possibly a constant) when 7p; is held fixed. 
Hence the maximum is attained at one of the endpoints of the range of a;. 
This range is given by the inequalities (3.8) to (3.10) with a, = 0 and a = 
c — a;. Its endpoints correspond either to distributions in D, or (if given by 
a, + a; = c or 2a, = c) to cases where the value of ¥(F) exceeds 2p.p; + Ds 
and which already have been disposed of. 

Thus we need consider only dfs in D.. 

If c S 2, we have ¥(F) = 1 for the df in D, which has a single step at x 
Thus 


(3.11) sup ¥(F) = 1 


FeD 


Henceforth we assume that c > 2. 
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A distribution F in ®, assigns to the points a; and a, the respective proba- 
bilities 
@ — 1 l1l-a 


i= = 
P peer, Pe ee crime’ 


0 ay 1 s a2. 
If ¢ S 2a,, we have c S 2, a case already considered. If c > 2a,, then 
¥(F) = 0, a case which may be disregarded. We are left with the two cases 


(i) 2a <cecSaq+a, (ii) a; +a, <c S 2m. 
In Case (i), ¥(F) = 1 — pi, which is a decreasing function of a, . The lower 
bound for a; is max (0, c — a). 
If a, = 0 = c — a, then p; = 1 — 1/az, so that ¥(F) is a decreasing func- 
tion of a, . The lower bound for a, is max (1, c) = c, and we obtain 
W(F) = 1—= (1 — 1/c)* = 2/e — 1/e’. 
If a, = c — @ 2 O, then 


on 


-1 —2 
n=—2 = S 
2 


1 
2a, —c 2° 2(2a,—c)’ 
so that ¥(F) is an increasing function of a,. Since a2 S c, we obtain the same 
maximum of ¥(F) as in the previous case. 
In Case (ii), ¥(F) = p2, which is a decreasing function of a,. Hence we let 
a, = $c. Then ¥(F) is a decreasing function of a , and hence is maximized for 
a, = 0. We get ¥(F) = 4/c’. Hence 


(3.12) sup V(F) = max {2/c — 1/c’, 4/c’}, c> 2. 


Theorem 3.1 now follows from (3.11), (3.12) and the stated conditions under 
which the bounds are attained. 


4. Bounds for P(X, +--- + X, 2 c). Let X, n"(X, +--- + X,), 
and let w,(¢) denote the least upper bound of P(X, 2 tu) when X,, ---, X, 
are nonnegative, independent, and identically distributed with mean yu. It is 
easily seen that for every n 

w(t) = 1, ifs 1; Wen(t) S w(t), s = 1,2,---. 
By Markov’s inequality, w:(f) = 1/t if 1 s t. By Theorem 3.1, 
_ fife lst 
—* S —1/4¢ 5 


Let ws(t) be the least upper bound of P(X, = tu) when Xi, --- , X, are 
independent and nonnegative with common mean uy. Clearly, w,(t) S w(t). 
From [1] (in particular, Corollary 2.2) we have 


n even; 


3+ V5 <,. 
4 re 


3n + 1 + (5n’ + 6n + 5)" — ,. 
4n oor 





n arbitrary. 
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On the other hand, for any random variables X; which satisfy our assump- 
tions, P(X, = tu) is a lower bound for w,(¢). In particular, if nt => 1 and X,; = 0 
or nt with respective probabilities 1 — 1/nt and 1/nt, we get 


1\"_1 n-t1 
21-— Sas eae — os 
wl) 1 (1 1) >! on # 
Hence we have for all positive integers n 


: 1/2 
(41) w() mw L—Lhen—11 ¢ nt+1+ (Gn +ont 5)" < 


t, 
4n 


1 


—-S6<1; 
n 


os¢<1i-2. 
n 


(4.2) 


Equation (4.1) is also true for w.(¢), and (4.2) holds for w2,(¢) if } (8 + 5) St. 
Thus for large values of ¢ the known bounds for w,(¢) and w.(¢) cannot be im- 
proved substantially. 
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SOME MINIMAX INVARIANT PROCEDURES FOR ESTIMATING A 
CUMULATIVE DISTRIBUTION FUNCTION’ 


By Om P. AGGARWAL 
Purdue University and University of Washington 


1. Summary. Some invariant procedures, which are essentially step-functions, 
are considered as estimators of the cumulative distribution function of a one- 
dimensional random variable on which a finite fixed number of observations are 
given, for various loss functions. Two principal classes of loss functions are 
considered and it is shown that for a special loss function in one class the optimum 
procedure is the usual sample cumulative function. 


2. Introduction. Suppose that a sample X, , X,, --- , X, of a one-dimensional 
chance variable X is given. In a recent paper, Birnbaum [1] has discussed various 
techniques for deciding whether X has a completely specified continuous cumula- 
tive distribution function (c.d.f.), H(z) = P(X s z). In this paper is discussed 
an allied problem, viz., that if F(z) = P(X S z) is the unknown continuous 
c.d.f. of X and if F(x) be an estimate of F(x) based on the sample 
X,, X:,---,X,, what would be the best estimate * when certain forms of the 
loss function are given. 

Consider the loss function 


(1) L(F, f) = [ ” F(a) — P(2)/" az, 


where r is an integer = 1. It is almost obvious that the only invariant procedures 
for estimating F under the group of all one-to-one monotone transformations of 
the real numbers onto themselves which leave the sample values X; 
(¢ = 1, 2, --- , m) invariant are those which estimate F(x) by a step-function 


(2) F(z) = constant, say c; for x” <ir< x 


where X° < X® <.--- < X™ are the ordered observations and X“ and 
x‘"*» denote —« and + respectively. 
Using this estimate F’, we get 


* x (itl) 
UF, Py =f WF) — ol aPC) 


jm 4x (i) 


l < GD) _ GHD) _ ar 
i ye ) — e) |F(X%) — «| 


— (F(X) — ¢&)|F(X”) -— ¢ I] 


Received August 20, 1954; revised February 28, 1955. 
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and the right-hand side of this equation is a symmetric function of 
F(X), F(X2), --- , F(X.) where X,, X.,---, X, is the unordered sample. 
Using the probability integral transformation, it is clear that the distribution of 
L(F, F) does not depend on F for F continuous. Hence the risk R, being the 
expectation of L with respect to the distribution F, is constant and independent 
of F itself. We can thus take F to be a rectangular distribution over (0, 1) and 
write 


n Xi+1 
(4) R=EL | ” \e-ejl' ae, 
j=0 Sx; 
where X, < X, < --- < X, is an ordered sample of size n from this rectangular 
distribution over (0, 1), Xo and X,4; denote 0 and 1 respectively, and 
the symbol E denotes that the expectation is taken with respect to the rectangular 
distribution over (0, 1). In the rest of this paper, we shall use consistently the 
letter E to denote the fact that the expectation is to be taken with respect to the 
rectangular distribution over (0, 1). 
The same argument applies when the loss function is of the form 


_ [f° \F@) — FP) 


and in this case by taking F as in (2) we obtain 


= (7+ |e — ¢;|' 
(6) R=ED [Eo ae 
jo 4x; 2(1 — 2) 
where X;, j = 0,1, --- + 1, are the same as in (4). 

It is obvious that since risk R is constant, a minimax procedure among the 
class of invariant procedures being considered will be to choose c;, 
j = 0,1, --+ ,n, such that R is minimum. We consider in this paper the values of 
c; when the loss function is of the form (1) for all integers r = 1 and when the 
loss function is of the form (5) for r = 1 and when r is an even integer =2. 
The case when r is odd in (5) seems to be rather complicated. 


3. The loss function L(F, F) = [%. (F(x) — F(x)|’ dF (x) where r is any posi- 
tive even integer. Let r = 2s, then 
n Xj+1 n 
(7) R=ED[  @-c)tar= LQ, 
where 


fe 1 (28 + ) Qs+1—ky yk k 
(8) = GEL (* FT") (ain - XD 


for j = 0, 1, 2,---, 1m. 

Since the distribution of the jth order statistic X; in a sample of size n from 
the rectangular distribution over (0, 1) is a beta distribution with probability 
density 
1 
(9) 


1) “BGa-jeyY o-w™ 
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it is easily seen that for any positive integer r, 


Lk hw ay 42 S's od) 
rh BOD = GEN ED WET 


ee a ee 2 
(11) E(X%u, — Xf) = Ls (n + 1)(n + 2) (n + r) 


i forr = 1. j = 0,1, ---,n. 


forr # 1, 


Substituting from (11) in (8) we obtain 
1 2s+1 (* + ') 
23 + 1 Xu k 
r (—¢,)*"" k(j + 1) - -G + k j+k-—1) 
(n + iy: - (n + k) 


2s+1 


ca 2s ¥ jet G+ D - -(j+k-—1) 
es G et? )(-e ae --(n+hk) | 


For conciseness we introduce the following notation somewhat similar to the 
binomial and distinguished from it by an asterisk. Let 


(13) («- ot ty Yo ee . (—1)* () eypeti 


k=2 


imt bb +7 


for fixed real a and b and a positive integer g. For g = 0, let (13) be equal to 1. 
It is easily verified that for any positive integer r, 


3 oF csoMantah 


r a* (q—r)* 
(14) ( - ect.t) = . ( — ot) when r <q, 


0 when r> q. 


Using this notation we can write 


ale j+1\" 
~ o Braalecde. 


We have to choose c; so as to minimize R. Since R = )>¢ Q;, and from (7) 
we see that for each j, Q; is positive and depends only on j, it is obvious that 
minimizing FR is equivalent to minimizing Q; separately for each 7. We obtain 


mf. csi 
(16) ac, n+i1 (« - n+2 . 


; Q; a 2s(2s st 1) 2 j a 
wm ach Cun +S («, n+2 
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Since Q; = E frit! (x — c,)™" dx > 0, it is clear that 


dQ; nae 2s—2 
(18) —? = 28(2s8 — 1)E (x — ¢;)*~ dx > 0. 

dc? xj 
Let f(c;) = 0Q; / dc; . It is easily seen that f(0) is negative and f(1) is positive, 
and since f’(c;) > O for all real c; , f(c;) is a strictly increasing function of c¢;. 
Hence f(c;) = 0 for one and only one real value of c;, and this c; necessarily 
lies between 0 and 1. Thus we find that Q; , and hence R, is minimized by setting 
0Q; / dc; = 0 and solving for c; the resulting equation 


. (r—1)* 


This equation has one and only one real root which lies between 0 and 1. The 
minimax invariant procedure for the loss function of this section is thus to esti- 
mate F(x) by 


(20) F(x) = ¢;; Xj; Rz< Xiu, j3=0,1,---, 2, 


where X;, j = 0, 1,---,m + 1, have been defined earlier and c; is the real 
root of (19). It can further be seen from (19) that the equation remains un- 
changed if we replace 7 by n — j and c; by 1 — c;. Hence c,_; = 1 — c;, and 
we see that in practice the number of equations to be solved is about half the 
sample size. 

Special case for r = 2. When r = 2, the equation (19) reduces to a linear 
equation 


in GE ee 
(21) (« itt) = 0, 


which has the unique solution c; = (j + 1) / (n + 2). This result can, however, 
be obtained directly by writing the risk R from (7) and (12) for s = 1 in the form 


- unite | ea. 7 
(22) . mR tad n+2)° 
We see thus that R is minimized by choosing 


aie Sie: Bis 
(23) Cj n+2’ J = 0, 1, 


and hence the minimax invariant procedure is to estimate F(z) by 


(24) Pa) = 25, Fatt h 4iEaa ee ret 


where (X, , X2, --- , X,) is the ordered sample and X, and X,,; stand for — 
and + respectively. 

The minimum risk corresponding to this procedure is seen to be lé(n + 2). 
It is of some interest to note that the risk corresponding to the usual procedure 
of taking c; = j/n is given by Ygn. 
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4. The loss function L(F, Ff) =f%.|F(x) — F(x)|'’dF(x), where r is any 
positive integer. In this case 


n Xj+1 n 
(25) R=EL | z-ol a= DQ, 
xj I= 


j=0 
where 


(26) Q; = r+ 5 E(Xsn — ¢;) [Xiu — ¢;\" — (X; — ¢) |X; — ol’). 


Using (9) we obtain 


El(X; — 6) |X; — gl] = i(% LU, (y — oy" — y)"? dy 
(27) 


- [ (ce; — yy" — y)"? ay], 


and similarly, 


WT 2 ia ef ew ip (") 
(28) 


1 ej ‘ i 
If (y — o)/*y"(L — y)" 7 dy — I ( — yy’ — yy) ay], 
From (27) and (28) we obtain 


a= ()[f (y —o)y "(1 — y)" 7 "(ny — 9) dy 


+ (-1)’ fr (y — oy" — y)” 7 "(ny — 9) ay |. 


(29) 


Again it is obvious that to minimize R is equivalent to minimizing Q; for 
each j. Further we see that the conditions for differentiation with respect to c; 
under the integral sign in (29) are satisfied, and we obtain 


= -(")[ fo - ora - way 9 a 


+ (—1)’ [ (y — ¢)'y? “(1 — y)” (ny — 9) ay |, 


ow r(") [ [w-ava - wry = 9) a 


(1 | ‘(y — o) ya - yn - 2) ay | 


(30) 


( Xj+i a 
ir(r _ ne | la — ¢;|" dx 
x; 


"|a(* ei — ¢)*4, 
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Define a function f by f(c;) = 8Q; / dc; . We see by straightforward computa- 
tions that 


ni(r + j — 1)! 
0 Fe FL 


_ _ nilr +n —j — 1)! 
1) = = De Fm 


Since from (31) it is seen that, for all r = 2, f’(c;) = 0°Q; / dc > O for all real 
c; (the special case for r = 1 is given at t the end of this section), f is a strictly 
increasing function of c; and assumes the value zero for one and only one real 
value of c; , and this value of c; necessarily lies between zero and one. Thus we 
find that Q; and hence R is minimized by setting 0Q; / dc; = 0 and solving for 
c; the resulting equation 


< 0, 


> 0. 


| (y — o)y7" = y)” "(ny — 5) dy 


(32) : 
+ (-1 | (y — ¢)'y7 "1 — y)” "(ny — 5) dy = 0. 


Thus the problem reduces to that of solving the above equation for 

= 0, 1, ---, n. The general solution of (32) giving c; explicitly in terms of 
j, n, and r does not seem to be possible. We shall, however, simplify the equation 
so that it should not be too difficult to obtain the solution in any given case. It 
can, however, be proved from (32) that c,_; = 1 — c;, so that the number of 
equations to be solved in practice will be about half the sample size. 

We can write (32) as 


[ wav - (ny - 9 dy 
(33) 


= [1 — (-1)] i (y — ¢)’'y’ “1 — y)” "(ny — 5) dy. 


The left-hand side of equation (33) can be expressed as 
(34) Xk (f) (2) c))" “BG +k,n —j +1), 


which indicates that the coefficient of cj is zero. For k ~ 0, we can utilize the 
fact that ({) k = r ({23) and reduce it further to the form 


" : r a + IG+t)---G+tk—-D 
3 B mt ie o ) xej IVF UO Te— 1 
which by making use of the notation introduced in (13) can be written as 


bi sa : ae : _jt+l (r—1)* 
(36) (—1)" rB(j + 1,n —j +1) (« itt) ‘ 
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When r is even, the right-hand side of the equation (33) reduces to zero and 
cancelling out the nonzero coefficient ( —1)"rBGj + 1, — j + 1) from the 
left-hand side as expressed by (36) we obtain c; as a root of the same equation as 
(19) obtained earlier by a different method. 


The right-hand side of the equation (33), except for the factor [1 — (—1)’], 
can be written as 


(37) xX (;) (-c'* | 2 (—1)""(j + s) ® Ms yttitet dy, 
and by making use of the relation 


(38) & (-'(f) 4 = Bur +0, 


it can be reduced to 
(39) (-1)"r 2 (-1)" (" 34 Ber, j + 8 + Ie. 
s=0 
Using (36) and (37) we can, thus, write the equation (33) as 


; (r—1)* 
BG +1,.-j +1) («, -it}) 
(40) 


= {1 — (-))’ x (—1)’ fe me Bir,j +s + Ich. 


This equation is to be solved for c; to get a minimax invariant procedure for 
estimating F when the loss function is given by (1). When r is even, the factor 
1 — (—1)’ = 0 and we get an equation of degree (r — 1). When r is odd, the 
factor 1 — (—1)’ = 2 and the equation reduces to 


Liew (” 7 ") B(r,j +8 + ies 
ae. t 


j + 1 (r—1)* 

— BG + 1,n—5 +) (c -2*2) =0 
which is an equation of degree n + r. In either case there is one and only one 
real root which lies between 0 and 1 and the set of such roots for7 = 0,1, --- ,n 
minimizes R. 

An alternative way of expressing the right-hand side of (33) is to rewrite (39) 
in the form: 


anf i r+j+s 
7a ont» av 8 oa Cj 
(42) (—1y"rt 2 '( s reer ere oa Ta 


It is easily verified that (42) is equal to 


r—1 > s({7 ay eo ” j+s 
(—1)""r! > (-1) ( ; )J | | zi dz -:- dz, 
0 0 0 


(43) ej Zr 22 
= (-1n ff fda — ai des ++ der. 
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The equation (40) can, therefore, also be expressed as 


: sc i + 1\°-"" 
Bj + 1,1” i+ v(« n+2 
(44) 


-o-(-nie-o ff". [aa - ada 


Special case forr = 1. When r = 1, (30) is easily seen to reduce to 
(45) Baa (™)l [ta - ata - BG + 1n-s +0], 
0c; J 0 
from which follows easily the result given in (31), viz., 
¥Q; n\ ; ioe 
(46) a = 2 (") dil — 0)" 


Setting 0Q; / dc; = 0 and solving we obtain c; as the median of the beta dis- 
tribution with density 


1 
Bj +1,n-j+)) 
for j = 0, 1, 2, --- , n. Since (46) shows that 0°Q; / dc} > 0 for 0 < c; <1, it 
follows that this solution for c; in fact minimizes Q; for 7 = 0, 1, --- , , and 


hence minimizes R. The equation (44) for c; obtained for r = 2 thus holds good 
for r = 1 as well and the minimax invariant procedure is seen to estimate F(x) by 


(48) P(x) = ¢;; X;32< Xju, j=0,1,--:,n, 


(47) g(z) = 2(1 — 2)"’, 0<2S1, 


where (X, , X2, --- , X,) is the ordered sample, Xo and X,,4; stand for — » and 
+ respectively, andc; (j = 0,1, --- ,) is the median of the beta distribu- 
tion with density (47). It is rather interesting to note that the value 
(j + 1)/(n + 2) for c; obtained in the last section for r = 2 is the mean of the 
same beta distribution. 

The actual computation of the values of c; (j = 0, 1, --- , m) can be easily 
carried out, for a given n, with the help of the tables of the incomplete beta 
function [2]. In the notation of those tables 


[ 2? "(1 — x)*" dz 
0 
1 


(49) L.(p,q) = —-————_-- 
l z?(1 — x)*" de 


Thus we have to find the value of z for each 7 such that 
(50) LgG+i,n—-j+1) = 4. 
Using the relation 


(51) I(p, q) = 1 a I,_.(q, P), 
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TABLE I 
Values of c; (j = 0,1,---,n)forn = 1,2,--- 


ce 


79 

61 .84 

50 = .69 

42 = .58 
36 =. 50 

32 .44 

a es 
26 .35 = .45 
ee 
.22 .30 .38 
20 .27 ~~ .35 


me Ohne 


_— 





NrooOOmnN oo 


~ 








it is seen that as in the general case, 
(52) Ca-j = 1—¢;. 


The values of c; (j = 0,1, ---,n)forn = 1,2, --- , 12 correct to two decimal 
places are computed and tabulated as shown in Table I. 


5. The loss function L(F,f) = f*. (F(x) — P(x)J' / F(x)[1 — F(x)] dF (zx) 
where r is any positive even integer. 
Let r = 2s; then 


(53) R=Ey [G2 a = OQ, 


j= ©X; x(1 — 2) 


where 


X j+1 (x i c;)”* 
Ef oa as. 
(54) Q; x, 21 — 2) , 
Since Xp = 0 and X,4: = 1, it is clear that in order to obtain finite risk it is 
necessary and sufficient that co = 0 and c, = 1. For j # 0, n, we can write 


2s—2 


p> 


2s 
imo h+1 an(X3hi — XP") + cj*(log Xj. — log X;) 


(55) 


— (1 = 6) flog (1 = Xiu) — log - x1], 


where 


(56) . ; yi. bee ont 
{ =() 
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The probability density of X ; is given by (9), from which we obtain 


(52) E (og X) =3(") [ y'Q - woe v ay. 
J} 0 


In order to evaluate (57) we use the following lemma. 
Lemma 5.1. 


(58) I y> "(1 — y)"? log y dy = roy (vj) — vn + DI, 


where ¥(k) = I’(k) / T(k). 

Proor. Let f(a) = fiy*™ (1 — y)”’ dy. The left-hand side of (58) is f’(a) 
evaluated at a = j as can be seen by differentiating under the integral sign. But 
f(a) = T(a)T(n — j + 1)/T(a+ n — j + 1), and the desired result is ob- 
tained by evaluating the logarithmic derivative of f(a) at a = j. 

From the lemma 5.1 and (57) we get 


(59) E(log X;) = Wj) — ¥(n + 1). 


In the same way, we obtain 


(60) E log (1 — Xj) = ¥(n —j + 1) — (nm + 1). 


Further, since '(k + 1) = kI(k), ’(RK + 1) = (hk) + kKI'’(k), we see that 
Wk+1) = "(k+ 1)/T(K + 1) = 1/k + (kK), and hence the function y 


satisfies the difference equation 
(61) : Wk + 1) — Wk) = I/k. 
From (59), (60), and (61) we get 
(62) E(log X j41 — log X;) = 1/), forj = 1,2,---,n, 
and 
(63) Eflog (1 — Xj4:) — log (1 — X;)] = —1/ (n — 9), 
forj = 0,1,---, 
Substituting from (11), (62), and (63) in (55) we get 


iS (Gj + A)! n! 1 c;" 2s 
(64) @= 2 ategpy? aT? +o - a, 


and re from (56), we can write 

ni G + h)! — <— ) 1 28 1 28 
(65) Q; = > (—1) G+ 565 +s" - a. 
This is a a degree polynomial in c; . Collecting the lis of like powers 
of c; we obtain, for k = 0, 1, 2,--- , 2s — 2, 


2s—2 


(66) @ = Ts qe - ja + Bhnes, 
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where 


rr (28) Pn *S5*_ Gj + 2)! 
cig ead Ce) LF & w+h+ i ==]: 


To simplify (66) further, we state and prove the following lemma. 
Lema 5.2. If j and n are positive integers and j < n, then 


nivs Gth! _ 1 |: - Hite] 
(68) jiRM@+h+)i 2—jU les | 
Proor. The left-hand side is equal to 


() & ee RED . (‘) x I a™(1 — 2)" de 
7 (') [ (2? = 2") (1 = a) de 


= the right-hand side, after simplification. 
Substituting in (67) from (68) when g = 2s — 2 — k, we obtain 


2s ee ita 
(69) ge = (—1)* +5 (*) II xz fork = 0,1,2,---, 
a=l 


and substituting now in (66) we obtain 


Q; = =——s G + : S (—e;)* TL its] 


(70) a= 1 + Qa 


“ep 


using the notation introduced in (13). 
Now with the same reasoning as in Section 3 it will be seen that Q; and hence 
R is minimized by setting 0Q; / dc; = 0 and solving for c; the resulting equation 


(71) (c; — j/n)”" = 0. 


This equation, by the same argument as in Section 3, has one and only one real 
root which lies between 0 and 1. Since for j = 0, (71) reduces to c) ' = 0 giving 
co = 0 as the only real root, and for j = n, it reduces to (c, — 1)" * = 0, giving 
Cn = 1 as the only real root, it follows that we can say that the minimax invariant 
procedure for the loss function of this section is to estimate F(x) by 


F(x) = ¢;; X;S 2 < Xju, j = 0, 1,---, 2, 


where X;, j = 0, 1,---,m + 1, have been defined earlier and c; is the real 
root of (71). Again the number of equations to be solved in practice will be 
about half the sample size since it can be easily seen that (71) remains unchanged 
by replacing j by n — j and c; by 1 — c;, so that c,_; = 1 — ¢;. 
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Special case for r = 2. When r = 2, the equation (71) reduces, for each j, to 
a linear equation 


(¢; re j/n)* ra 0, 


which has the unique solution c; = j/n. This can also be seen by using (35), 
(70), (62), and (63) for r = 2 and writing the risk R in the form 


1 n—l n j 2 
(72) R=i+ aang (« - 2). 
n 2 ite -—2\ © 
Thus the minimax invariant estimate F for the loss function in the special case 
for r = 1 in this section turns out to be the usual sample cumulative function 


(73) F(z) =¢; =j/n, when X;S 2 < Xju, j3 =0,1,---,n, 


where X; < X_ < --- < X, is an ordered sample from the c.d.f. F, Xo and 
X41 standing for — © and + © respectively. The actual value of the risk corre- 
sponding to this estimate is 1/n. 


6. The loss function L(F, F) = f2.|F(x) — F(x)| / F(x){1 — F(x)] dF(z). 
In this case we obtain 


(74) R=ED fe —o|/20-2) dr = LQ, 


j=0 


where 


(75) Q; = ef” la — e;|/2(1 — 2) dz. 


As in the last section, it will be seen that for finite risk the necessary and 
sufficient condition is that co = 0 and c, = 1. For 7 ¥ 0, n, we obtain 


Q; = Efe; \log c; — log X;| —e; \log ¢; —log X 541] 
(76) +(1 — c;)|log (1 — cj) — log (1 — X54:)| 
—(1 — ¢;)\log (1 — c;)—log (1 — X;)|]. 


The distribution of X; has probability density p(y) given by (9) and the dis- 
tribution of X;4: has the probability density 


1 


(77) qty) = BGtis-) y(1 — y 


ar 


Using (9) and (77) we can express Q; in the form 


(78) Qi = (") Lf g(c;, y) dy — [ g(c;, y) iy], 
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where 
g(c; , y) = [e; loge; + (1 — e;) log (1 — e,) 
— cejlogy — (1 — ¢;) log (1 — y)ly7"(1 — y)" 7" — ny). 


Straightforward integration leads to 


(79) 


| oes, y) dy = y’(1 — y)"“le;(log c; — log y) + (1 —e,) (log (1 — co) 
(80) 


— log (1 — y))] + / (c; — yy? “(1 — y)”** dy + constant, 


which enables us to obtain Q; as 


a Q = (") [f" fy = py — rt ay 


1 
- / e-arti-wr av, 
for j = 1,2,---,” — 1. Since Q and Q, are fixed, and each Q; is positive and 


depends only on j, minimizing #? is equivalent to minimizing Q; for each 7. We 
see that 


(82) _ = (") | ‘yt = WF dy - (") l : y (1 — y)"* dy, 


vO; eM i, a" 
(83) a = 2(")e ‘(1 7: ¢;) “ 

Setting 0Q; / dc; = 0 and solving we obtain c; as the median of the beta distribu- 
tion with density 


1 1 n—j—l 
(84) h(z) = —=———,, 2? (1 -2)"", 0<2<81, 
B (j, a = Dd 
for j = 1,2, --- ,m — 1. Since (83) shows that °Q; / ac} > Ofor0 < c; <1, it 
follows that this solution for c; in fact minimizes Q; and hence minimizes R. 
To summarize, the minimax invariant procedure for the loss function considered 
in this section is to estimate F(x) by 


(85) F(x) = ¢;; X;S 2 < Xj, j3=0,1,---,n, 


where X;, j = 0,1, ---, + 1, have been defined earlier, co = 0,c, = 1 and 
forj = 1,2,---,n-— 1, c;is the median of the beta distribution, with density 
(84). Again it is interesting to note that the value j/n for c; obtained in the last 
section for r = 2 is the mean of the same beta distribution. 

Further it is obvious that c,._; = 1 — c; and only about half the total number 
of c values are to be actually computed. These can be obtained with the help 
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of the tables of the incomplete beta-function [2] as indicated in Section 4. How- 
ever, if a table for c values like Table I has been constructed, no fresh computa- 
tions are needed, since the value of c; (j = 1, 2,---,n — 1) for any n in this 
case is equal to the value of c;_1 for n — 2 in Table I. For example, when n = 10, 
the values ofc; (j = 0,1, --- , 10) correct to two decimal places are 


(86) 0, .07, .18, .28, .39, .50, .61, .72, .82, .93, 1. 


I am thankful to Professors Z. W. Birnbaum and H. Rubin for some helpful 
discussions during the preparation of this paper. 
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1. Summary. The authors were prompted by a general problem concerning hit 
probabilities arising in military operations to seek the distribution of 
Q. = Dina? , k = 2, 3, where the z; are normally and independently dis- 
tributed with zero mean and unit variance, doa; = 1, and a; > 0. While the 
distribution of a positive definite quadratic form in independent normal variates 
has been the subject of several papers in recent years [6], [11], [12], laborious 
computations are required to prepare from existing results the percentiles of the 
distribution and a table of hit probabilities. This paper discusses the exact dis- 
tribution of Q, and then obtains and tabulates the distributions of Q. and Q;, 
accurate to four places. Three other approaches to the distributions are dis- 
cussed and compared with the exact results: a derivation by Hotelling [8], the 
Cornish-Fisher asymptotic approximation [3], and the approximation obtained 
by replacing the quadratic form with a chi-square variate whose first two mo- 
ments are equated to those of the quadratic form—a type of approximation 
used in components of variance analysis. The exact values and the approxima- 
tions are given in Tables I and II. The tables have been prepared with the original 
problem in mind, but also serve as an aid in several problems arising out of quite 
different contexts, [1], [2], [13]. These are discussed in Section 6. 


2. Introduction. A general class of problems arises in military operations when 
the hit probability of a weapon depends on the combination of two random 
errors. Suppose random errors in predicted location or predicted position of target 
and random errors in aim of weapon occur. For purposes of exposition let us limit 
ourselves to errors in two dimensions. Denote the true position of a target by T, 
the predicted position, or point of aim, by A, and the point of impact of a weapon 
aimed at A by J. Let x, , y; , be the components of the vector TA and 22, y2 
the components of the vector AJ. If we denote the radius of effectiveness of the 
weapon by R, then the probability of a hit P is the probability that the resultant 
vector TJ has length no greater than R, or 


(1) P= Pixs + ys <P}, 


where 73 = % +22, Ys = Yi + Ye. 


Received July 2, 1954; revised April 18, 1955. 

1 The tables in this report were computed at Columbia University and Stanford Uni- 
versity with the partial support of Office of Naval Research contracts N6onr 271 Task 
Order II (NR-042-034) and N6onr 251 Task Order III (NR-042-993). 
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3935 


3963 
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TABLE I 
PQ: S t) 


8, .2 


3002 3384 





4521 4697 


3981 5182 





4512 


4533 





5034 


5962 
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TABLE I—Continued 


8, .2 


6630 


8638 
8638 
8606 
8788 
9441 9178 9167 
9441 
9442 
10000 


9761 
9760 
9770 
10000 


9895 
9895 
9903 
10000 10000 





First entry in cell is exact to 4 decimal places. 

Second entry is Hotelling’s result. 

Third entry is ‘‘components of variance’’ chi square approximation. 
Fourth entry is Cornish-Fisher result. 


Now assume that the two random errors are each subject to a bivariate normal 
distribution with zero means and with covariance matrix ||,0;;|| and ||,0;;|! 
respectively. Then x; and y; are components of a vector having a bivariate normal 
distribution with zero means and covariance matrix ||,0;; + o;;|| = ||A«,||. For 
the present, assume the components of each error to be independent; i.e., ||,0;;|| 
and ||,0;;|| are diagonal. This restriction, which is not essential, implies that z; 
and y; are independently distributed. If = Aj," 2; andy = Am” y; , then 2” and 
y’ each have a chi-square distribution with one degree of freedom. We may then 
write 


(2) P = Plaga’ + ay’ < 2} 
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TABLE II 
P{Q; S t} 
as, 42, a 


6, .2, .2 


05035 
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TABLE IIl—Continued 


a3, a2, a1 


4, 4,4, A, 3, 3 4, 4,. 5, 3, -6, .2,. 5, 4, 6, 3, 1 oly «By of 8, .1, 1 


60837 6102 6370 6535 
6102 6697 
6083 6355 
6097 6619 


7881 i 7935 7930 


7885 7776 7766 
8042 8008 

8760 8723 8663 

8659 8527 

8636 8558 

8992 8888 

9552 9477 9378 

9698 9394 9270 
9477 9379 

10000 9933 10000 
9920 9841 9863 9831 9775 9703 
9920 9763 9734 
9921 ‘ 9794 9724 
10000 10000 10000 
9979 9900 9855 
9979 9964 9916 9897 
9979 9969 9917 9874 
10000 10000 10000 10000 





First entry in cell is exact to 4 decimal places. 

Second entry is Hotelling’s result. 

Third entry is ‘‘components of variance’’ chi square approximation. 
Fourth entry is Cornish-Fisher result. 


where o” = An + Aw, a; = Aui/o’ and t = R’/o’. In the three-dimensional situa- 
tion, we get by the same argument 


(3) P = Pia” + ay” + asz” S t}, 
where this time o” = dx + Aw + Azz . Similarly, if we leave physical reality, we 
obtain in k dimensions 


(4) P= PAY ati < i) = Pia, < t} 


t=1 


k . oo ° 
where o” = >>1\i; . Now remove the restriction of independence of errors; that, 
is, let the covariance matrix be an arbitrary positive definite matrix. Then there 
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exists a real non-singular linear transformation [4], Y = CX, such that the 
covariance matrix in the new variables y; is the unit matrix, and Q,; has the form 
diay? , where the a; are the roots of the determinantal equation |A — aA~’| = 0, 
and are all positive, A is the matrix of the coefficients of Q, considered as a form 
in the variables z;, and A is the covariance matrix {\,;;} in these variables. 


Thus in this paper only (4) is discussed since all other situations can be reduced 
to it. 


3. Exact distribution. Consider the positive definite quadratic form 
Q. = >} ai , where the z; are normally and independently distributed about 
zero with unit variance, >a; = 1, and 0 < a; S a;4; . Denote by F;,(t) the dis- 


tribution function F,(t) = P{Q, S t}, and by f,(t) the probability density. Then 
the Laplace transform ¢:(p) of f.(¢) is 


(5) o(p) = I (1 + 2a;p)*”. 


From this, f,(¢) and F(t) can be obtained in various forms. The authors are 
including only those which appear most efficient for computing purposes. The 
following approach was found most useful. Inverting the transform (5) we obtain 


6) fl) = +f e!y(p) dp. 


We now apply Cauchy’s theorem to the integrand in (6) taken along the closed 
contour from —7R to 7iR along the imaginary axis, from 7R to — R along a quarter 
circle around the origin, from —R to —1 and back along the negative real axis 
with small clockwise semicircular identations of radius r to avoid the singularities 
—lga;, and from —R back to —7R along a quarter circle around the origin. 
Letting R — © and r — 0, we obtain 

—1/2ae, 


(7) fal) = D9 fe ulp) ap, 
k p—l1/2a,; 
fan = a - e'’buss(p) dp 
(8) 


n=l 


1 k —1/2a on +1 
+<- » (-1)** [ e' bon41(p) dp. 
T —1/2a en 


We now let c; = 1/a; , and make the changes of variables 
p = pr(z, t) = —}a — 2°/t, (-x <p < —}e), 
P = Pal%) = 3(Cn-1 — Cn)® — (Cri + Cn), (—}ena < p < 4e,). 
For even index we obtain 


(—1)* 2k al 1k dx 
(9) Sax(t = ot ve} [ 7 (—1) Gon(x, t, 2k) V1 — 2’ 


n=l 
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where 


(10) G,(z, tk) = ™™ II _, em + 2p,(x)]”. 


m= 1m n— 


Integrating (9), we a 


Qi) Fx) =14+ 62 {I val fo - 1" Sa) 


Similarly, for odd index, 
yt (2 
feu) = th ve} 
(12) 
4 > (—1) "Gonzs(z, t, 2k + 1) <7 at Tor+i(0), 


where 


_yyk (2k+1 k-4 x ‘ 
(13) tan = SY (I ve} (5) ot [Hee as, 


j=l 


and 


(14) H(z, t, k) = i [x* + 4(c - Cmz1)t}* 


Integrating (8), we get 


Fay() = 1+ = a ve} 


j=l 


(15) 
n Gonyi(z, t, 2k + 1) dx 
f me 1) ine nes mane lied Red), 


where 


_ (-1)* fT nv? a RED 
(16) Raw) = On (I va} (4) € as e” dz. 


The integrals over the interval (— 1, 1) are readily computed using the quadra- 
ture formula [16] 


(17) [ f(z) as - = lim | => s(2!”), 
where x” are the zeros of the Tchebycheff polynomials 7',(z) of degree n. 
Similarly, the zeros y$” and Christoffel numbers a{” of the Hermite polynomials 


[14] can be used in computing r;(¢) and R,(¢) with the quadrature formula [14], 
[16} 


(18) [ e“f(y) dy = lim > as” f(y”). 
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These are usually small unless ¢ is also small, or the two largest coefficients 
c, and c, are almost equal. Except under these conditions, they can generally be 
shown to be negligible by the inequalities 


(19) lroe4a(t) |? <xalll - Sows a} re, 
+1 


— Cm 


(20) Res) |? << — 2 {it Sees ate brie, 


m=l Cy — Cm+1 


which are obtained from (13) and (16) by making use of 
2k 
H(z, 4 k)| SIT {4 — cmadt}, —Iav(@, 4)| 2 der. 
For the original two-dimensional problem, we obtain from (9) and (11), 


i —— ' dx 
a —i(ey+e2)t 3(cy—c) tz 
. FN FSP HO Le Vi= 2 


et(ei—ea) tz dx 


9 oe cme —ieytes)t cuireihtervecnane <ieipetnetiien 
(22) FQ) =1 7 Vat oe 1( +) — (4 —a@)t V1 — 2’ 


which can be simplified to 
(23) fil) = Va Fe DRC, — cdl, 


2 


R(cyte)t 
(24) FO = sl hiya = Waal ae, 


where J is the modified Bessel function of order zero. Although (23) is analytically 
preferable to (21), (22) is easier to evaluate numerically than (24) except for 
very small values of ¢. 

The case k = 3 applies to the original problem in three dimensions. This time 
(12) and (15) become 


1 C1 C2C3_—Hegte,)t 
t) = = ge 
fa(t) 4/292 
ge teste 


2 Senet S TOI 


(25) 
+ r;(t), 
and 

Ft) = 1 — 1 VBace tert 


1 | i 


[(ce +o) + (2 — enaly/2e, — (a +) — (2 — ae VI — & 
+ R,(t) 


(26) 





472 ARTHUR GRAD AND HERBERT SOLOMON 


where 


sata i i 4 —est/2 7 ieee ack A ad ea 
(7) nl) = - 7 Viaaatew | [2 + Hee — a)ilia® + Be — cf)” 


and 


1 
R,(t) = = V 146, 020; Oe” 


, ta A aati dad 
© Lee (x? + egt/2)-/[x? + (cs — cil[x? + 3(es — ca) 


Numerical evaluation of f,(¢) and F,(¢) becomes more difficult if the constants 
c; are almost equal. In that case, however, an as yet unpublished method of 
Hotelling [8] becomes effective. This will be discussed in the next section. On 
the other hand, for /;(¢), if two of the constants, say c; , actually coincide, then 
the problem simplifies and we obtain as the inverse transform of (5), [5] 


(28) 


(29) A) = de; 4/ 2 1" ext VI oh 


Hence 


1 ; 
(30) F;() =I (Jj ct, — i) ~_ glx e**? erf ~/i(c; — cit, 
where I(u, p) is the incomplete gamma function as tabulated in [10]. The first 
entry of each cell in Tables I and II was obtained from the quadrature formulas 
given above and is correct to four decimal places. 

There is an interesting relationship between the distribution of Q. and the 
distribution of the measure of the random set given in [15]. If .o:; = 02 fori = j 
and .o;; = 0 for i ¥ j and the vector TA mentioned early in the paper is con- 
stant, say D, the graph labelled Figure 1 in [15] gives the desired probability if 
we consider the abscissa values equal to D/o, and the ordinate values equal to 
R/o, . Let us now return to our present problem but add the further restriction 
p0ij = oy fori = j and ,o;; = Ofori ¥ j. Then the probability density of D/c,, 
h(D/e;), is 


(31) h (2) = PD Kn 4 (2) 
Tp Tp op 


and 


re rlosata}- CCDC) 
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where g(R/o. | D/o) is the probability read from the graph in [15] and the co- 
efficients of Q. are now both equal to 4. As an illustration, consider the following 
four situations: (a) R/o. = 2, 0/02 = 3; (b) R/oa = 2, 0,/02 = 1; (c) R/os = 3, 
o,/o, = 2; (d) R/o. = 8, o,/o2 = 1; then in the table immediately following we 
get the top entries from Table I, and the bottom entries by numerical integration 
of (28). 


(a) (b) (c) (d) 


3935 -6321 -7769 -8883 
3971 -6328 7767 8955 


Thus, since only two place accuracy at best could be obtained by reading 
g(R/o.| D/o.) from the graph, a rather simple numerical integration yields 
values extremely close to the exact values. 


4. Hotelling’s method.’ Let2q = Q; and modify Q, byrequiring >.a; = k = 2m 
so that in our cases of special interest m = 1 or 3. The a; are now the ratios of 
the latent roots of Q, to k times the trace of the matrix of Q, where k is rank. 
Then Hotelling states that the density of q is, 


qs 2 
(33) I@ = “Fmy 2g ole), 
where 


_ rit(m) f°” 
(34) be es | S@Laa) de 


and L,(q) is a Laguerre polynomial defined by 


(35) ba =e (te 7 Se. 


t r—t t! 


Now define 


(36) we = 2 (a; — 1)’. 


2 In a letter to one of the authors [8] in November, 1950, Hotelling outlined his method 
for obtaining the distribution of quadratic forms. This letter was in response to a query 
regarding a talk Hotelling gave in a seminar attended by one of the authors in Berkeley in 
1947. Mention of this research also appears in an abstract by Hotelling in Ann. Math. Stat., 
Vol. 19 (1948), p. 119. 
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Then 


m—1 —@ U2 2q q 
$0) = say a ¢[1- + | 


ml se “a a. ae - wees | 
“an * 5 m(m +1) m(m + 1)(m + 2) 


U2 
+r, a, —— 4q? 
4! m(m + 1) ~~ m(m + 1)(m + 2) 
tom ne HeTs | 
m(m + 1)(m + 2)(m + 3) J 
_ 12s + Su, ts E _ 5q + 10q° = 10q° 
5! m= mm+i1)  m(m + 1)(m + 2) 


4 


5 
‘iat 1)---(m+3) mm+l1)--- any 
+ further terms requiring higher moments of the normal distribution. 


Rearranging Hotelling’s terms to make optimum use of the Hartley-Pearson 
Tables [7], we get 


FQ) = Plz S$ 2@-lL+a—-dt+d—d] 
+ P{xi < 2t}-[—2Qd. + 3d; — 4d, + 5d;] 
+ P{x§ S 2t}-[d, — 3ds + 6d, — 10d;] 
+ Pixs S 2t}-[ds — 4d, + 10d, 
+ P{xio S 2t}-[d. — 5ds] 
+ P{zxiz S 2t}-[ds} 


where 


a= 7, 2 = Hu + 4), ds = rho(12us + Sum), 


and x; is a chi-square variate with n degrees of freedom. The values obtained 
by this method using (34) are quite accurate. Using the fixed number of terms 
in (34), the departure from the exact value depends on the variance of the 
a,’s. This is noted by a glance at the second entry in each cell of Tables I and 
II having more than one entry. Thus this method complements the method 
given in Section 3 precisely in those cases where the most numerical difficulty is 
experienced; namely, when the variance in the a,’s is small. 
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5. Approximations. Where a third entry appears in a cell of Tables I and II, 
it is an approximation obtained in the following way. Let Q, = cr.; this is an 
approximating device often used in components of variance analysis. Then, 
equating the first two moments, we get 


k k 
=), a; = 1, cn = 2 ai. 
tal t=] 
Thus Q, is approximated by ( >-{ aj)z” where xz” has n = 1/ > j aj degrees of 
freedom. To avoid the interpolation caused by fractional degrees of freedom we 
can employ the Wilson-Hilferty approximation [17] which states that given a 
chi-square variate with n degrees of freedom, say x2, then (x’/n)"* is approxi- 
mately normally distributed with mean (1 — 2/9n) and variance 2/9n; thus we 
may write 


(39) P{Q < t} = p{(1 - = + 2/2) < i 


as a modified approximation where z is normally distributed with zero mean and 
unit variance. Finally we get 


(40) Pi St} = Pls < Ca bei, 
Vedi ai a; 


This result, together with Kelley’s Tables [9], was used to obtain the third 
entry in the cells of the tables wherever they appear. 


Where a fourth entry appears in a cell of the tables, it is an approximation 
obtained from the Cornish-Fisher [3] asymptotic expansion of Q; in terms of 
normal variable. This approximation requires the cumulants of Q; , but these 
are easy to obtain from the cumulants of the chi-square variate with one degree 
of freedom by applying the additive properties of cumulants. Computation of 
the values in Tables I and II is based on all terms in the asymptotic expansion 
of orders through 1/k’. 


6. Applications. In discussing applications there is, of course, the obvious one 
which motivated this paper. As an illustration, assume .ou = 100, a2. = 400, 
pon = 100, ,o22 = 1400, and R = 40. In this case the usual assumption of circu- 
lar symmetry is certainly not realistic. Here a; = .1, a. = .9, and¢ = .£8. Thus 
the probability of a hit is read as .6159 from Column 5 in Table I. Moreover, 
Tables I and II make it possible to compare the relative effects of changes in 
weapon radius with changes in aiming and location errors. 

In [2] it is demonstrated that the usual chi-square tests for goodness of fit 
do not have a limiting chi-square distribution when the maximum likelihood 
estimates of the parameters are based on the original observations rather than 
on the cell frequencies. The asymptotic distribution in this situation is that of 

j—e—t 


(41) ~ vit > 6:y3, 


tal fi=j—s 
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where j is the number of cells, s is the number of parameters to be estimated, 
and the coefficients 6; are between zero and one and are the roots of a determi- 
nantal equation. In the usual “goodness of fit” situation in statistics, distribu- 
tions rarely contain more than two parameters to be estimated from the data. 
Thus Tables I and II are singularly appropriate if the number of cells is kept 
down. In an illustration given in [2], 


(42) P = P{xi + 823 + .2x3 = 3.84} 


is desired, and P = .12 is given as a lower bound. This can be quickly modified 
so that Table II can be used, for dividing through by two in (38) we get 


(43) P = P{.5ai + 423 + .1z3 = 1.92}. 


From an Aitken seven point interpolation in the (.5, .4, .1) column in Table IT, 
we get P = .1344. 

In [1], the limiting distribution of nw’ is obtained as the distribution of the 
quadratic form Q,, = >-f ai where a; = 1/i’x’, and w’ is the von Mises criterion 
for goodness of fit between a sample cumulative distribution function and a 
specified population distribution function. In [13], it is shown that a simple 
variant of the w’ criterion for the two-sample test has the same limiting distribu- 
tion. While a table of this distribution is given in [1] it should be possible to use 
Table II to some advantage, even though this means neglecting all terms from 
i = 4 onwards. Since }-f a; = } = .1667 and )-} a; = 49 / 36x = .1379, a 
reasonable upper bound should be given by Table II. For example take ¢t = .046, 
t = .101, and ¢ = .405, then the table in [1] yields .10, .42, and .93 respec- 
tively while from Table II we get using 


i, ts , Ms 36 9 4 36° 
(44) pPit+ e+ 3s ‘ -P{Ri+ dat hat = Se" 
that the probabilities are .28, .54, and .94 respectively. These values are obtained 
by interpolation and are correct to two places. However, the upper bound is not 
too sharp when P is small. Also Table II is constructed with ¢ as the argument 
while the table in [1] has P as the argument and thus may be more useful in some 
contexts and, of course, less in others. 
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URN MODELS OF CORRELATION AND A COMPARISON WITH THE 
MULTIVARIATE NORMAL INTEGRAL 


By J. A. McFappren 


U.S. Naval Ordnance Laboratory 


Summary. In a special case of Polya’s urn scheme, the probability that the 
first n draws are all of the same color is interpreted as a function of the (single) 
correlation coefficient. A more general urn model is introduced in which the 
correlation between pairs of results may differ from pair to pair, and again the 
probability of consecutive colors is considered. This result is compared with the 
probability of coincidence in sign under the multivariate normal distribution. 
The comparison suggests a new approximation for the probability in the multi- 
variate normal case. This approximation appears to be useful only in the Polya 
case, where the correlations are all equal. 


1. Introduction. Consider n correlated random variables x, , %,---, %n. If 
each variable x; may assume only the values +1 and —1 and either result is 
equally probable (a priori), then, in terms of the correlation coefficients between 
pairs (x; , z;), what is the probability that all n variables are positive? An ex- 
ample of such a problem is provided by Polya’s urn scheme and by a generaliza- 
tion given in Section 3. 

A more difficult problem is the following: Consider n correlated continuous 
variables £ , & , --- , &, , With each having a mean value of zero and symmetry 
about the mean. If these variables obey a given distribution law (e.g., the 
multivariate normal distribution), what is the probability that all n variables 
are simultaneously positive? This second problem may be reduced, in principle, 
to the first by associating the signs of the £; with the signs of the z; ; that is, 

1, gi = 0, 
(1) y= ¢=1,2,---,n. 
~~, Ei < 0; 

The next two sections are concerned with examples of the first problem 

mentioned above. 


2. Polya’s urn scheme. Consider the symmetric case of Polya’s urn scheme 
({1], [2], [3]), in which an urn contains initially a black balls and a red balls. Suc- 
cessive drawings are performed, with replacement, and with the further provision 
that A extra balls are added after each drawing, all of the same color as the ball 
most recently drawn. A may be negative, but it must obey the inequality, 


(2) a+(n—1)A20, 


where n is the total number of draws, in order that neither color may be 
overdrawn. 


Received November 1, 1954. 
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The probability of drawing a black ball in the first trial is, of course, a/2a = }. 
The probability of drawing two black balls in the first two trials is a(a + A) / 
2a(2a + A). The probability of drawing n black balls in the first n trials is 


(3) P, = (a/A), / (2a/A). ? 


where (a), = a(a + 1)(a + 2) --- (a +n — 1), and (a) = 1. 

Let z; = +1 if the ith draw is black, and let z; = —1 if the ith draw is red. 
Then Polya has shown ({3], p. 140) that the correlation coefficient between z; 
and x; (¢ ¥ j) is 
(4) r = A/ (2a+ A). 


The result is the same for all possible pairs (7, 7). As A varies from —a / (n — 1) 
to ©, r varies from —1 / (2n — 3) tol. 

Equation (4) may be verified easily for the casei = 1,7 = 2. Since the mean 
values E(zx;) are all zero and the variances E(zx;) are all unity, the correlation 
coefficient between zx; and z; is simply the expectation EH(z,;x;). For the first 
two draws, there are four possibilities: (+1, +1), (+1, —1), (—1, +1), and 
(—1, —1). Then 


a afa+A) _ a ae 
r= Bit) = 270ea +A) *2a@a+ a) ~ a+ a’ 
which agrees with (4). The same procedure may be carried out for other pairs 
(t, j). 
a/A may now be eliminated from (3) by equation (4); then, in terms of the 
correlation alone, the probability that all the z,; are equal is given by 


(5) P, = ((L — r}/2r)n/ (1 — r]/r)n- 


At this point the integers a and A may be forgotten. Let r assume any value 
in the range from —1 / (2n — 3) to 1—not only the fractional values given by 
equation (4). 

P,, may be expressed in terms of the beta function, as follows: 


_ Bin + [1 — 1) / 2r, (1 — 1) / 2r) 
. Pe = BO = A/ 27,0 =A /2 


or, equivalently, as a terminating hypergeometric series: 
P, = 2°F(—n/2, (1 — nj / 2; 1/2r; 1) 
(7) aa n(n — 1). , nln — 1)(n — 2)(n — 3) 2 \ 
met ( .O. sane CS 
[The identity between (6) and (7) follows from the theorem on F(a, b; c; 1)—see 


[4], p. 282—and from the multiplication theorem for gamma functions, [4], 
p. 240. See also [4], p. 262, problem 37.] 


Note that if r — 0, the probability (7) approaches 2™", which is the usual 
result for a sequence of Bernoulli trials when the individual probabilities are 4. 
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If r — 1, the expression in braces in (7) becomes the binomial series for 
3((1 + 1)" + (1 — 1)"] and therefore P,, — 4. This is the case of perfect coherence, 
which leaves only two possibilities: all black or all red. 


3. A generalized urn scheme. It was noted in Section 2 that the Polya scheme 
yields a correlation matrix in which all elements except those on the main diagonal 
have the value (4). The following model exhibits a general correlation matrix. 

As before, the urn contains initially a black balls and a red balls. In contrast 
with the single addition parameter A, this scheme makes use of a matrix of ele- 
ments A,; . One ball is drawn and replaced, and A, balls are added of the color 
drawn. Again one ball is drawn and replaced; then A,; are added of the first 
color drawn and A,; of the second color drawn. After the (k — 1)th draw (and 
replacement), A, are added of the first color drawn, A, of the second, etc., 
and A,_, of the (k — 1)th color drawn. 

To simplify the algebra, let 

k 


(8) Dew = > Ani, m=1,2,---,k-—1; 


taem-+1 


thus, immediately preceding the kth draw, D,, is the total number of balls which 
have been added up to that time because of the mth draw. Some of the A’s may 
be negative, but to prevent overdrawing they must obey the inequality, 


k—1 


for all integral k between 2 and n, inclusive. (n is again the total number of 
draws.) 


The probabilities of the sequences black-black and black-red in the first two 
draws are, respectively, 
a(a + Dy) a(a) 
10 P. eS eet > P ne 
(10) ++ Qa(2a + Diz) *~ " 2aa + Dy) 
By symmetry, P_ = P,, and P_, = P,_, as in the Polya scheme. For three 
draws the probabilities are 


_ aa + Dy(o+ Da + Dad 
2a(2a + D12)(2a + Diz + Ds)’ 


P “ a(a + Dy) (a) 
— 2a(2a + Dy:)(2a + Dy; + Day)’ 


2a(2a +Dy)(2a + Dy + Dos)’ 


ise a(a)(a + Du) 
-_* 2a(2a + Dy)(2a + Dis + Dos)’ 


and the other four may be obtained by symmetry. 


Psi+ 


Pi. = 





URN MODELS OF CORRELATION 481 


What are the correlation coefficients in this scheme? Again let 2; = 1 or 
—1 if the ith draw is black or red, respectively, so that r;; = E(x,;x;). For the 
first two draws, 


(12) Tie = E(xyt2) = 2(P44 — Ps) = Du / (2a + Dy). 


The last equality follows from the substitution of (10). Equation (12) conforms 
with (4), since for two draws the two urn schemes are identical. 
For the first three draws, by (10) and (11), 


tis = 2(Pri4 — Pri — Pi— + Ps) 
= 2(Pii4 — Psi) + 2(0P3. — Pi—) 


= 9p Dis + Deg Dy — Dx 
2a + Dis + Du 2a + Dis + Dos 


= 2 DulPs+ + Pr) + Dus(P+4 — Ps) 
2a + Du + Dn 


= Dis + P12 Dog 
2a + Dis + Dos’ 


+ 2P,— 


and by a similar calculation, 


_ TDi + Drs 
20+ Da + Dn 


The above method is easily generalized for the first n draws. The result is 


n—l 
yt ry D; ; 

15) r= i, $= 1,2,---,2—1, 
\ 2a + Djat in 
where r;; = 1 for all 7. Notice that if all the correlation coefficients are known 
for the first (n — 1) draws, then equation (15) gives the remaining coefficients 
necessary to correlate the nth draw. 

- The next quantity to be calculated is the ratio 2P, / P,-1 , where P, is again 
the probability of drawing n black balls in the first n trials. (P; = Py = 4, 
P, = P44, Ps = P+, etc.) By the first of equations (10), 

P+ 2(a + Dy) ais 


Dy 
2a + Di" 


By equations (10) and (11), 
Dy + Des 


P4i+ 
ee he el. 
(17) P44 t, 2a + Dy + Do 
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Similarly, for n draws, 


n—l 
(18) g Pe 4 2 Dn se enee 
P.- 2a + Zjon jn 


The next objective is to express the ratios (18) as functions of the correlation 
coefficients alone. A new variable is introduced: 


" Se Sy By? 94 °° ra de 


Then equations (15) may be written 


n—l 
(20) rin = De rsGin, 


(19) Gin 


and equation (18) may be written in the form 


n—l 


(21) —i=), G, — 2P./ Pa. 
j=l 
Now equations (20) and (21) constitute » equations in the n unknowns, 
Gin, Gaon, +++ , Grin, and 2P, / P,»1. The equations may be solved directly 


for the last quantity; then a simple manipulation of the determinants yields 
the result, 


Ll + tin Tis + Tin °° The + Tis 
Tie + Ten °** Tena + Tan 


(22) 5° its Tin + Ta-1.0 Ton—1 Sal Tra-1.1 
n—l 


The denominator is simply the determinant of the (n — 1)-variate correlation 
matrix. 

As in the case of the Polya scheme, the correlation coefficients may now be re- 
garded as continuous rather than discrete. These coefficients may assume any 
values between —1 and 1 which do not violate (9), i.e., which do not lead to 
negative probabilities. By comparison with equation (18), it follows that the 
inequality (9) may be rewritten 


(23) P, /Piun 2 90, 
for k = 2,3, --- ,n, where P; / P,_1 is given by (22). 
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Finally, by induction, the probability P, is given by 
l+rn3 fe + "13 | 
(1 + rh) 
P,P eT 
1 Tye 
Te 1 
1 + Tin Tio +Tin ** Tima + Tin 


Tio + Ten °° Tena + Ton 


Tin—1 + Tr-1n T2,n-1 + Tn—-1,0 


1 


Tin-1 T2,n-1 


When all the coefficients r;; are equal for i, 7 = 1, 2,--- , (¢ ¥ 7), equation 
(24) reduces to the Polya result (5). 
When n = 2 and 3 the result (24) becomes simply 


(25) P, = 3(1 + rz) 
(26) P3 = $(1 + ri + ris + 123). 


For higher values of n the complete expansion of (24) is quite complicated. 
However P,, may be expanded in a power series in the r’s. To second order, for 


n = 4, 


(27) Py = Ye(1 + rie + ris + res + ia + to + re + Tre H+ isra + Tues) 
+ O(r’). 


By induction, it can be shown that, for general n, 


(28) P, =2™ E + Drt LD Core t+ rata t+ rata) + ow) |. 


j>t21 I>k>j>i21 


When all the coefficients r;; are equal (i + 7), the number of first-order terms 
in (28) is the binomial coefficient (7). The number of second-order terms is 3(?); 
therefore this special case of (28) checks (to second order) with the result (7) of 
the Polya model. 

If P, is expanded to a higher order, then the series is no longer symmetric 
with respect to interchange of the variables 2 , t2, --- , tn. 


4. The multivariate normal distribution. The following example belongs to 
the second type of problem given in the introduction. 
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Suppose that & , &,---, & obey the multivariate normal distribution law 
with correlation matrix 


1 a 
(29) x2 pan 


Pin Pom *** 1 


If all the mean values E(é;) are zero, what is the probability P, that all n vari- 
ables are simultaneously positive? 

As stated in the introduction, this question may be reduced to the correspond- 
ing problem in the discrete variables x; , t2, --- , 2, by means of equation (1); 
however, it must be remembered that the correlation r;; betwecn x; and z; is not 
the same as the correlation p;; between &; and &; . 

Various writers have investigated the probability P, for the multivariate 
normal distribution. For n = 2 there is the Stieltjes-Sheppard result [5], 


(30) P, = } (1 + > sin pu) 


For n = 3 the result is (see Kendall [7] and David [8]): 


(31) eer E +2 (in pug + sin pus + sin ox) | 


For n > 3 no solution has been given in closed form, but there exists the 
infinite series of Aitken, Kendall ({6], [7]), and Moran [9]. For n = 4, their series 
may be written, to second order in the p;; , 


4 


2 aa 
(32) Ps = Zs 1 + - * sin” pis + 4 {Piz P3s + pis Pe + Pu pos + 06) | 


® j>iz1 
For general n, the probability is 
(33) P, = zi +2 > sin” p,; 
® j>iz1 
4 n 
® I>k>j>iz1 
(This result will be derived in Section 6.) 


Equation (33) may be compared with the corresponding probability (28) for 
the generalized urn scheme. Note that under the transformation, 


(pi; pet +p pj + pit pie) + 06) | 


(34) ry = = sin” pi, 


the two expressions (28) and (33) agree to second order. [Note that r;;, given 
by (34), is actually the correlation between the discrete variables x; and 2; , 
given by equation (1), when the é’s are normal. See equation (41).] This agree- 
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ment suggests that the substitution of (34) into the closed form (24) of the result 
for the generalized urn scheme might provide an approximation for P,, in the 
multivariate normal case, to be used in place of the poorly converging series of 
Aitken, Kendall, and Moran. (See David’s remarks [8] on convergence.) 

For n > 3 and arbitrary p;;, the agreement between the two power series 
does not extend beyond the second-order terms. 


5. Numerical results. The approximation indicated above has been tested by 
a comparison with several known results for the multivariate normal integral. 

When all the p;; have the same value, defined by p, then r is obtained from 
equation (34); then this value of r is used in the closed expression (5) of the 
Polya scheme. 

When n = 2 or 3, the result obtained from (5) is exact for all values of p; 
that is, equation (30) and the special case of (31) follow immediately. 

When p = 0, (5) gives P, = 2°"; when p = 1, P, = 3. (See the explanation 
at the end of Section 2.) It appears, therefore, that when p = 0 or 1 the results 
are exact for all values of n. 

When p = 3, then r = 3 and equation (5) gives P, = 1 / (n + 1). This result 
is also exact for all n. {See Ruben [10], p. 214, equation (70). In fact, Ruben’s 
(70) holds for a more general class of distributions, as shown by Foster and 
Stuart [11], p. 22.} 

When 1/p = 2, 3, ---, 12, the results obtained from equation (5) may be 
compared with those of Ruben ({10], pp. 222-223). For the case n = 4, the 
comparison is shown in Table I. (Ruben’s values have been rounded off to seven 
decimal places.) 

The best agreement in Table I occurs for small p, as one might have predicted 
after a comparison of the corresponding power series. 

For a given value of p, the approximation grows steadily worse as n increases. 
A comparison for p = } is shown in Table IT. 


TABLE I 
(n = 4) 


P, [from (5)] Ruben’s a(1/p) 





0.20000 00 0.20000 00 
0.14975 57 0.14973 77 
0.12649 38 0.12647 92 
0.11302 30 0.11301 25 
0.10423 15 0.10422 40 
0.09804 22 0.09803 67 
0.09344 92 0.09344 51 
0.08990 58 0.08990 27 
0.08708 94 0.08708 71 
0.08479 73 0.08479 5 
0.08289 56 0.08289 4 
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TABLE II 
(ep = 3) 


P», (from (5)] Ruben’s @,(4) 


0.29021 53 0.29021 53 
0.18532 30 0.18532 30 
0.12649 38 0.12647 92 
0.09069 62 0.09065 98 
0.06754 16 0.06748 27 
0.05183 56 0.05175 69 
0.04076 86 0.04067 37 
0.03272 29 0.03261 57 
0.02671 93 0.02660 32 


oto On oOo WwW te 


— 





TABLE III 


px | pu P. (from (24)] Plackett’s #? 





. 13393 0.13333 
13194 
13194 
13393 


0 4 


occ co 


16369 
. 16369 
. 16369 
16369 





oocooco 


15000 
15278 
14881 
0.14881 





oo 

















It appears from Tables I and II that the Polya urn approximation might be 
useful in many problems, at least where p is not greater than 4 and where n is 
not much greater than 4. Formulas (5) and (34) are certainly more easily appli- 
cable than Ruben’s integral recursion formulas or an interpolation in Ruben’s 
table. 

In the general case of unequal p;; , the results are much less satisfactory. This 
fact can be illustrated by a comparison with several exact values given by 
Plackett ((12], p. 360) for the quadrivariate case. The comparison is shown in 
Table III. 

To obtain P,, one substitutes the values of r;; from (34) into the closed ex- 
pression (24). When n > 3, (24) is not symmetric with respect to interchange of 
the indices; therefore different results are possible. The four values of P, given 
in Table III (for each correlation matrix) are those obtained when 2,4, 23, x2, and 
2%, respectively, are considered as the fourth draw from the urn. Without 
doubt the lack of symmetry in (24) is partly responsible for the poor agreement. 
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It appears from Table III (and from other comparisons with Plackett’s figures) 
that the generalized urn approximation is not a satisfactory method for com- 
puting the general multivariate normal integral. The only possible exceptions 
would be those situations in which the p,;; are very small or are all nearly equal 
to each other, i.e., approaching the Polya case. 

Recently Plackett [12] has given a numerical method for evaluating the 
quadrivariate normal integral. It involves more labor than the urn scheme but 
yields considerably greater accuracy. 


6. General remarks. This paper will be concluded by a discussion of the 
general problem described in the introduction, in which no specific model or 
distribution law is assumed. 

Consider again the n mutually interacting (discrete) random variables 2; , 
%2,°*, tn. Let the observed values of these variables (in a given experiment) be 
Yi, Y2,°** » Yn, Where y; = +1. Then there are 2” possible combinations for 
the n results y; , and each result has the probability 


(35) P(ai = Yr, Te = Y2,*°* Tn = Yn)- 


For any given distribution law there are 2" product moments, i.e., the expected 
values E(1), E(x;), E(ajax), E(xjxex1), +--+, E(aixe --+ 2). All other moments 
degenerate to one of these, since x; = lfe.g., E(x) = E(1); E(xiz2) = E(a:)). 

The 2” moments may be expressed in terms of the 2” probabilities, as follows: 


E(1) = 001 - Pla = 1,22 =Y2,°°*, Zn = Yn); 
vi 


E(a:) = Do y;P(ay = 41,22 = Yo, **' > tn = Yn), JG =1,2,-°° 
vi 


E(zjm) = D> yjyeP(ti = yr, 22 = Yay *** > Zn = Yn) 
vi 


E(x,% +++ tn) = Do mye ++ YnP (ti = Yr, %2 = Yay *** Tn 
vi 


where the sums are taken over all combinations y, = +1, yz. = 
Yn = +1. 

If all the equations (36) are added together, then all the probabilities cancel 
except P(x; = 22 = --- = 2, = 1), which was previously called P, . Then 


(7) Pe= 21+ 2 Be) + Yo Ble) +--+ Blew 2). 


J>t2= 


Now suppose that the symmetry of the distribution law is such that all product 
moments of odd order are zero, i.e., 


E(2%:) = E(x) = --- = E(a,) = 0, 


E(ayrers) = Eaves) = +++ = E(2q-2tn12,) = 0, 


(38) 
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etc. Then (37) becomes 


(39) P= 2/1 + z E(a;2;) + > E(x; xj%%%1) + | 


j>i2l I>k>j>i2z1 


ending with a product moment of order n if n is even, or with moments of order 
(n — 1) if n is odd. 

It is now evident that P, and P; in equations (25) and (26) could be obtained 
directly from the general formula (39) by the substitution of n = 2 or 3 and of 
the definition E(z;x;) = r;;. In other words, it is only for n > 3 that a specific 
urn model must be assumed, and this specialization is reflected in the values 
of the higher-order moments E(x;2x jx, 2;), etc. in (39). 

On the other hand, suppose £; are continuous random variables obeying a 
given distribution law with symmetry as in (38). Then for the calculation of 
P, and P;, E(x;x;) must be obtained as a function of the parameters of the 
original distribution. For P, and P;, E(x;x;2,2,) must be obtained, etc., and all 
other P’s will follow two at a time from the higher moments. 

It is now possible to derive equations (31) and (33). Assume that P, is given 
by (30). By matching (30) with (39) when n = 2, it follows that 


(40) E (x22) = > sin Pi2- 


Then by the symmetry of the normal distribution, 


(41) E(e;2,) = 2 sin py, 
TT 


for all i and j, 7 ¥ j, and equation (31) follows by the substitution of the moments 
(41) into the general expression (39) with n = 3. 

Now assume that P, is given correctly by equation (32). Then (32) may be 
matched with (39) when n = 4, with the aid of (41), and the result is given by 


4 
(42) E(a, 22,2324) = = (pr2 pss + pisos + prs pes) + O(p’). 
Then, by symmetry, the general fourth-order product moment is 


(43) E(a;xjx,2) = 4 
and, since all higher-order moments are of higher order in the p,; , equation (33) 
follows. The last operation is the substitution of the moments (41) and (43) into 
the general expression (39). 

This process is equivalent to a method used by David [8] to obtain P,4; from 
P,, when n is even. 


(pij per + pep + pupa) +O), i<j<k<l, 
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STATISTICS AND SUBFIELDS' 


By R. R. Banapur 
The University of Chicago 


1. Introduction and summary. Let (X, S, «) be a probability measure space: 
X is set of points z, S is a field of subsets of X, and yu is a countably additive 
measure on S with u(X) = 1.’ A subfield is a field Sp of subsets of X such that 
So & S, that is, each So-measurable set is also S-measurable. A statistic is a 
function defined on X. There is no a priori restriction on the class of statistics; 
in particular, statistics are not necessarily real-valued, and a real-valued statistic 
is not necessarily an S-measurable function. For any statistic f, let S; denote the 
class of all sets which are S-measurable and of the form f ‘(B), where B is a 
subset of the range of f. The class S,; is clearly a subfield, and is called the sub- 
field induced by f. 

The induced subfield S; plays a central role in the study of a statistic f, for 
the following reason. The probabilist or mathematical statistician is usually con- 
cerned not with the statistic f as such, but rather with the class of random vari- 
ables (i.e., real-valued S-measurable functions) which depend on x only through 
f, and, as is easily seen, this class of random variables is exactly the class of real- 
valued S;-measurable functions. In case the given statistic f is a random variable 
(and therefore itself an object of study), the argument just given continues to 
apply, because in this case f is necessarily an S;-measurable function. 

This paper discusses certain measure-theoretic problems concerning the rela- 
tions between subfields, subfields of the apparently special form S,, and sta- 
tistics. The main problems, as also the main conclusions, are described in the 
following paragraphs. Most of the conclusions of the paper are valid only in the 
case when (X, S) is (or may be taken to be) a euclidean sample space, that is, 
X is a Borel set of the m-dimensional euclidean space (1 S m S ~), and S 
is the field of Borel sets of X. It is assumed henceforth that this is the case under 
consideration. 

There are two main problems. The first is whether every subfield is inducible 
by a statistic. This problem is discussed (in a more general setting) in [2], and the 
conclusions of the present paper complement those of [2]. 

It is shown here that every subfield is inducible by a statistic if and only if 
the sample space is discrete, that is to say, X is a countable set and S is the class 
of all subsets of X (Theorem 1). This result is, however, not quite relevant to 
situations where the natural equivalence relation between subfields is not identity 
but approximability to within sets of u-measure zero. The equivalence relation 


Received May 24, 1954. 

1This work was supported in part by the Office of Naval Research under Contract 
N6onr-271, T.O. X1, Project 042-034. 

2 This paper uses some of the notation and terminology of the first part of [1]. In par- 
ticular, all fields considered are understood to be countably additive. 
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referred to is defined as follows. A subfield S; is a contraction of a subfield S, if 
corresponding to each real-valued S,-measurable function f; there exists an 
S.-measurable function f. such that f.(z) = f,(x) except on a set of u-measure 
zero; we then write S; © S» [S, u]. The subfields S, and S, are equivalent if each 
is a contraction of the other; we then write S,; = S, [S, u]. It is shown that, in 
fact, corresponding to any subfield So there exists an f such that S,; is equivalent 
to So, and that this f may be taken to be a random variable (Theorem 2). 

In the literature the notion of contraction (and the derived notion of equiva- 
lence) has been defined for statistics in two ways, which are here called con- 
traction and functional contraction. A statistic f is a contraction of a statistic g 
if S; is a contraction of S, (that is, S; € S, [S, u]); f is a functional contraction 
of g (written f © g [S, u]) if there exists a function h on the range of g into that 
of f, and an S-measurable set N with u(N) = 0, such that f(x) = h(g(x)) for xz 
in X — N. (Cf. [8], [4].) It seems to the writer that for most (possibly all) tech- 
nical purposes the relevant concept is contraction as just defined (cf. Lemmas 
7.1 and 3.2 of [1]). However, functional contraction has simpler interpretations 
and greater intuitive appeal. 

The second problem is the exact relation between contraction and functional 
contraction. It is shown that, in general, functional contraction does not imply 
contraction (Example 1), and also that contraction does not imply functional 
contraction (Example 2). If, however, both f and g are random variables, then 
S; ¢ S, [S, u] if and only if f S g [S, u] (Theorem 3). It follows, in particular, 
that if the sample space is discrete, then contraction coincides with functional 
contraction. 

The problems described above arose in connection with the theory of suffi- 
ciency, and the results have applications in that theory. It follows, for example 
(assuming that the sample space is euclidean and that the set of alternative dis- 
tributions of the sample point is a dominated set), that if f is a necessary and 
sufficient statistic, then S; is a necessary and sufficient subfield (Corollary 2). 

The following are some general conclusions bearing on mathematical models 
for studies such as [1]. (a) The notion of a subfield, while certainly no less general 
than that of a statistic, is in fact no more general. (b) There is no loss of gen- 
erality, or other disadvantage, in defining a statistic to be a random variable. 
On the contrary, admission of nonmeasurable functions to the discussion leads 
to inconsistencies between extension and functional extension—this seems un- 
desirable. (c) If f is a random variable, it is immaterial whether f is regarded as a 
statistic or as a Borel-measurable transformation (cf. [1], p. 431). These satis- 
factory conclusions do not necessarily hold for an arbitrary space (X, S, u). An 


example given in [2] shows that at least (a) and (c) are not valid in the general 
case. 


2. Theorems. Let R be the real line, and R be the class of Borel sets of R. In 
general, we shall denote the n-dimensional euclidean space by R” and the class 
of Borel sets of R” by R"” (1 Sn 3S @). The following well-known result (cf. 
[5], pp. 159-160) is stated here as a lemma for convenience of reference. 
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Lemma 1. There exists a one to one function a, on R” onto R such that A ¢ R” 
implies a,(A) ¢ R and B e Rimpliesa,'(B)eR" (lSnk @~). 

If f is a statistic on X into a space Y, and C is a class of subsets of Y, then 
f '(C) denotes the class of all sets of X which are of the form f'(B) with B e C. 
Clearly, f-'(C) is a field if and only if C is a field. A function f on X into R” 
is said to be S-measurable if f_\(R") is a subfield of S. 

In this section and the following one, a number of results involving S-measur- 
able functions (specifically, Lemmas 2, 4, 5, 6, 8, and 9; Theorems 2 and 3; 
Corollaries 1 and 3) are stated and proved in terms of real-valued functions. It 
can be seen from Lemma 1, or otherwise directly from the proofs, that these 
results are in fact valid for S-measurable functions in general. 

Lemma 2. A function f on X into R is S-measurable if and only if f is an S;- 
measurable function. 

Proor. Since S;  S in any case, f '(R) & S,; implies f"(R) & S. Conversely, 
if f is S-measurable, then A ef ‘(R) implies A ¢ S;, by the definition of S;, so 
that f '(R) © S; . This completes the proof. 

THEOREM 1. A necessary and sufficient condition that every subfield of S be in- 
ducible by a statistic is that X be a countable set. 

Proor. Suppose first that X is countable, and let there be given a field Sp © 
S. For each z ¢ X let E, be the intersection of all sets A € X such that re A 
and A ¢ S,. Let D, , De, --- , be an enumeration of the sets E, such that D; n 
D; is empty for i ¥ j and U;D; = X. Define f(x) = iforxe D; (i = 1,2, ---). 
We shall show that Sp = S,;. 

Since X is separable in the discrete topology, the intersection of any collec- 
tion of subsets of X equals the intersection of a countable subcollection. Hence 
D; € So for each i. Since S; = {f- \(N):N G1} where J is the set of positive 
integers, and f “(N) = U;.~D;, it follows that A ¢ S; implies A ¢ Sp. To prove 
the converse, choose and fix an A ¢ Sp . Since z ¢ E, for alla, we have A CU,,,E,; 
on the other hand, ZE, G A for each x ¢ A, so that U,.,Z, © A; hence A = 
U.eaE. = UiewD; = f'(N) for some N © J. Thus A ¢ Sp implies A ¢ S; . Hence 
S; = So. Since Sp is arbitrary, the first part of the theorem is proved. 

To prove the second part suppose that X is an uncountable set. Let S* be 
the class of all sets A such that one of the sets A and X — A is countable. Then 
S* is a subfield of S such that for each z ¢ X the set {x} belongs to S*. More- 
over, it can be shown that S* = §S, that is to say, there exists at least one A ¢ S 
such that neither A nor X — A is countable. It follows from Lemma 1 of [2] 
that there exists no f such that S* = S,. This completes the proof. 

A subfield So is separable if there exists a countable class C of subsets of X such 
that So is the field generated by C. While S itself is separable, a given subfield 
So may or may not be separable.’ However, we have: 

Lemma 3. Corresponding to any subfield So there exists a separable subfield S* 
such that S* = So [S, uy]. 

Proor. For the purposes of this proof only, for any two sets A and B in S 


’ The writer is indebted to Professor A. Dvoretzky for this remark. 
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write A — B if and only if (A — B) u (B — A) is of u-measure zero. Let {@,: 
6 « Q} be the set of equivalence classes generated by the relation —, where © is 
an index set of points 6. For any @ and 6 in Q@ define p(@, 8) = w(A — B) + 
u(B — A), where A and B are sets in Cy and @; respectively. Since S is separable, 
Q is a separable metric space under the metric p (([5], p. 168). 

Let % be the set of all @ such that C, contains at least one So-measurable set. 
Then Q» is a nonempty subset of 2 and therefore separable. Let 2* be a countable 
subset of 29 which is dense in Q). For each @ in Q* let Ag be an So-measurable 
set in @,, and let S* be the field generated by the class {A¢:@ ¢ 0*}. It is clear 
that S* is a separable field, and that S* € S,. We proceed to show that So © 
S* [S, wu}. 

Choose and fix an A ¢ So. By the definition of 2 , there exists a @ ¢ Q) such 
that A ¢@, . Since 0* is dense in Q , there exists a sequence {@,} in 2* such that 
lim,+. p(@., 6) = 0. Letting fo denote the characteristic function of the set A, 
it follows from the definition of S* that there exists a sequence f, , fe, ---+ of 
S*-measurable characteristic functions such that lim,.,, f, = fo in measure. 
Hence there exists a subsequence of {f,}, say {g,.}, such that, except on an S-y- 
null set, limn.,, gn(z) = fo(x)([5], p. 93). Let B be the set of all x such that 
lim,.,.. gn(x) = 1. Then B is S*-measurable, and A ~ B. Since A is arbitrary, 
we conclude that S, ¢ S* [S, x]. This completes the proof. 

Lemma 4. If S* is a separable subfield, there exists an S-measurable function f 
on X into R such that f'(R) = S*. 

Proor. Suppose that S* is generated by C = {A,, Ao, ---}. Let ¢; be the 
characteristic function of A; , and define ¥(x) = (¢1(x), d(x), -- +). Suppose that 
y takes values in the space R” of points rn) = (ri, 72, --:) forl Sn SS @. 
Since A; = {x:¢(a) = 1} = w'(B,), where B; = {rwir;s = 1}, we have 
A; ev '(R") for each i; hence S* & y'(R"). On the other hand, since each ¢; is 
an S*-measurable function, y is S*-measurable also, so that y'(R") & S*. 
Thus S* = y¥"'(R"); the lemma as stated now follows from Lemma 1 by taking 
f = aw. 

Lemma 5. If f is an S-measurable function on X into R, then f'(R) = S, 
[S, ul]. 

Proor. According to Lemma 2, f '(R) & S,;. We have therefore to show that 
S, & f'(R) [S, ul. 

We recall that we have assumed X € R”, X eR”, andS = {XnA:AeR"}. 
Let am be the function described in Lemma 1, and write a,(X) = Y, a(S) = T, 
g(y) = flan'(y)) for y e Y. Then Y is a Borel set of the real line, T is the class 
of Borel sets of Y, g is a T-measurable function on Y into R, and f'\(R) = 
an'(g (R)), Ss = an (T,). Define »(C) = u(ap'(C)) for C eT. It is then easily 
seen that the desired conclusion is equivalent to T, G g ‘(R) [T, »]. 

Choose and fix a set A eT, . By definition of T, , there exists a set BCR 
such that g ‘(B) = A. Now, since A is a Borel set, and g is a Borel measurable 
function, it follows from Lusin’s theorem ((6], p. 72) that for each k = 1,2 --- 
there exists a set A, ¢ T such that A, © A, o(A — A,) < 1/k, and g(A,) e R. 
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Let Ao = U, A, . Then Ap ¢ T, Ao G A, »(A — Ao) = 0, andg(Ao) = Ux g(A,) = 
Bo (say) is a Borel set. Now, g (Bo) = g'(g(Ao)) D Ao. Also, By = g(Ao) & 
g(A) = B, so that g"(Bo) Sg *(B) = A. Hence C = g™‘(Bs) is a set such that 
Ay [© C CA, 80 that »(A — C) = 0;since C is a set ing (R), and since A ¢T, 
in this argument is arbitrary, it follows that T, © g‘(R) [T, v]. This completes 
the proof. 

Remark 1. The preceding argument shows that u is a perfect measure, i.e., 
for each real-valued S-measurable function f, corresponding to each set A in Sy 
there exists a B in f-'(R) such that B © A and »(A — B) = 0. (Cf. [9], p. 18; 
also pp. 248-251.) Perfection is a little stronger than the property stated in 
Lemma 5. The fact that u is perfect can be deduced, alternatively, from Theorem 
1 of [9], p. 18, since (X, S, u) is a euclidean space. 

Remark 2. If X is an uncountable set, the “exact”’ form of Lemma 5 is false, 
that is to say, there do exist S-measurable functions f for which f“(R) ¥ S,. 
This follows easily from the theory of analytic sets [7]. 

As an immediate consequence of Lemmas 3, 4, and 5 we have: 

THEOREM 2. Corresponding to any subfield So there exists an S-measurable func- 
tion f on X into R such that S; = So {S, uJ. 

The remainder of this section is devoted to showing that, for S-measurable 
functions, contraction coincides with functional contraction (Theorem 3). 

Lemma 6. If f is an S-measurable function on X into R, and S; & §, [S, ul, 
then f & g {S, ul). 

Proor. Since f is S;-measurable (cf. Lemma 2), the hypothesis S; € S, [S, u] 
yields the existence of an S,-measurable function, h say, such that the set 
{a:f(x) # h(x)} is S-u-null. Denote this last set by N, and let g(X — N) = A. 

Since h is S,-measurable, it depends on z only through g (cf. Lemma 3.2 of 
{1]), say h(x) = k(g(x)) for all x. Define k* = k on A and = a on g(X) — A, 
where a is a point in f(X). Then k* is a function on the range of g into that of 
f such that {x:f(x) # k*(g(x))} is a subset of N; this completes the proof. 

Let S be the class of all sets of the form.A u C where A is S-measurable and C 
is a subset of an S-y-null set, and define 7(A u C) = u(A). Then § is a field con- 
taining S, 7 is a probability measure on §, and g(A) = u(A) for A e S. For any 
statistic f, S, is defined, as usual, as the class of all S-measurable sets of the form 
f*(B). (Note. In general, §, is different from (S,).) 

Lemma 7. If f Sg [S, al, then 8; CS, [S, a). 

Proor. By hypothesis, there exists a function h on the range of g into that of 
f, and an §-g-null set N such that f(x) = h(g(x)) on X — N. Choose and fix a set 
in §;, say A = f ‘(B). Define A* = g™(C), where C = h™'(B). 

Write N* = {x : f(x) ¥ h(g(x))}. Then N* © N, so that N* is an S--null 
set. We have A* n (X — N*) = {x:g eh '(B), f = hg} = {x:hg eB, f = hg} = 
{a: fe B,f = hg} = An(X — N*). Hence A* — A ( Nt andA — A* € N*. 
Since A is 5-measurable and z is complete on §, it follows that A* is S-measur- 
able (and therefore in §,) and that A* differs from A by a set of f-measure 
zero. Since A ¢ §; is arbitrary, the lemma is proved. 

Lemna 8. If g is an S-measurable function on X into R, then S, = §, [S, a). 
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Proor. Since S ¢ §, we have S, & §,; and since g is S-measurable, g"(R) & 
S, by Lemma 2. Thus g“(R) & S, & §,. The desired conclusion can now be 
established by showing that 5, ¢ g™’(R) [S, a]. The demonstration of this last 
relation is essentially the same as the proof of the nontrivial part of Lemma 5, 
and so is omitted. 


Lemma 9. If g is an S-measurable function on X into R, and f | g [S, ul], then 
S, € S, [S, u}. 
Proor. fSg([S,ul of Sg IS, a) 


— §,¢5,[5,2] (Lemma 7) 
+ § cS, [5, a] (Lemma 8) 
— S&, cS, [§, a] 
+ S, CS, [S, x]. 


THEOREM 3. Let f and g be S-measurable functions on X into R. Then S; & S, 
[S, »] of an only af f Sg [S, wl. 

The proof is immediate from Lemmas 6 and 9. 

It can be shown by the methods used in this section that Theorems 1, 2, and 3 
are valid for any probability space (X, S, u) which satisfies the following condi- 
tions: (i) for each x in X, {x} is S-measurable, (ii) S is separable, and (iii) u is 
perfect. However, such a space can differ but little from a euclidean sample space. 


3. Applications to the theory of sufficiency. We suppose now that there is 
given a euclidean sample space (X, S), as before, and a dominated set P of 
probability measures on S. Definitions of the technical terms used here without 
explanation are given in the first part of [1]. The conclusions of this section are 
relevant to problem 3 of [1], p. 441. 

Let u be an arbitrary but fixed probability measure, not necessarily in P, such 
that for each S-measurable set A, u(A) = 0 if and only if p(A) = 0 for each p 
in P. The existence of such a u is assured by Lemma 7 of [8]. 

Coro.uary 1. There exists a function f on X into R such that: 

(a) f is S-~measurable, 

(b) S; ts a necessary and sufficient subfield, 

(c) f is a necessary and sufficient statistic. 

Proor. Since P is dominated, it follows from Theorem 6.2 of [1] that there 
exists a subfield Sp (say) which is necessary and sufficient. Let f be a function on 
X into R such that (a) holds, and such that S; = Sp [S, uJ; such an f exists, by 
Theorem 2. Property (b) is immediate (cf. Corollary 6.2 (iii) of [1]), and it re- 
mains to verify (c). Since S; is sufficient (= f is sufficient) by (b), we have only 
to show that f is a necessary statistic. Let g be any sufficient statistic. Then 
S; < S, [S, ul], since S, is sufficient by hypothesis and S,; is necessary by (b). 
Hence f © g [S, u], by (a) and Lemma 6. This completes the proof. 

Remark. It is evident from Lemma 5 that Corollary 1 remains valid if S, 
is replaced by f '(R) in (b). It can be shown that this modified version of Corol- 
lary 1 is valid not only in the present case but in any framework (X, S), P 
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provided that P is a separable metric space under the metric é(p, ¢g) = 
supaes |p(A) — gfA)|. 

Coro.uary 2. If g is a necessary and sufficient statistic, then S, is a necessary 
and sufficient subfield. 

Proor. We have only to show that if g is a necessary statistic, then S, is a 
necessary subfield. Let f be a function on X into R such that conditions (a), (b), 
and (c) of Corollary 1 are satisfied. Since f is sufficient and g is necessary, we 
have g & f [S, u]. Hence S, € S; [S, u] by Lemma 9. Since S, is necessary, it 
follows that S, is necessary, and the proof is complete. 

It should be stated here that the converse of Corollary 2 is false (cf. Example 
2 in Section 4), and also that the corollary itself is false in the general case (cf. 
(2]). 

Corouuary 3. Let g be an S-measurable function on X into R. Then g is a neces- 
sary and sufficient statistic if and only if g-(R) is a necessary and sufficient sub- 
Jield. 

Proor. In view of Corollary 2 and Lemma 5, we have only to show that if 
S, is a necessary subfield, then g is a necessary statistic; since g is S-measurable, 
the desired result follows from Lemma 6 by the argument used in establishing 
part (ce) of Corollary 1. 


4. Two examples. In both examples, X = U x V is the set of all points 
x = (u,v) with -«o <u < ~,—«& <v < o;§ is the field of Borel sets of 
X;P = {po:—«2 <6 < «}, where pz» is the measure on S corresponding to u 


and v being independent normally distributed random variables, with means 
6 and 0 respectively and variances 1; and u = poo. Let U and V denote, respec- 
tively, the coordinate axes v = 0 and u = 0. Let U and V denote, respectively, 
the Borel sets of U and V. 

The first example shows that the following propositions are false: 

(i) If f Sg [S, wu], then S, C S, [S, u]. (Cf., however, Lemmas 7 and 9.) 

(ii) If f is sufficient, and f is a functional contraction of g (that is, f  g [S, u)), 
then g is sufficient. (Cf. Theorem 6.4 of [1].) 

EXAMPLE 1. Let f(u, v) = u. To define g, let N €& V be a set such that NV 
has linear measure zero but is not in V. Let g(u, v) = uforveV — N, and 
g(u, vy) = Oforve N. Then f © g [S, ul], and also g € f [S, uw] so that f and g 
are functionally equivalent. However, it is easily seen from Fubini’s theorem 
((6], p. 83) that S, = f-'(U) while S, contains only X and the empty set. 

The second example shows that the following propositions are false: 

(iii) If S; € S, [S, ul], then f S g [S, u]. (Cf., however, Lemma 6.) 

(iv) If S; is a necessary and sufficient subfield, then f is a necessary and sufii- 
cient statistic. (Cf., however, Corollary 3 together with Lemma 5.) 

EXamMPLe 2. Define g(u, v) = u. To define f, let M & V be a set which is not 
measurable with respect to linear measure on V, and let f(u, v) = (u, 1) for 
veéM and f(u, v) = (u, 2) forv e V — M. We shall show that S; = S,, so that 
f and g are equivalent, but that f is not a functional contraction of g. 

Let U; = {x:v = 1}, Us = {x:v = 2}. Since g is exactly a function of f, 
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S, <& S,. To prove the converse, consider a fixed C ¢ S,. There exists a set 
B, & U; and a B, & U2 such that C = f-*(B, u B). Let the perpendicular pro- 
jections of B,, Bz, on U be A;, Az, respectively. Then, by the definition of f, 
C = Eu Fu G, where E = (A; n Az) x V, F = (A; — Az) x M, andG = 
(A, — A) x (V — M). Since C is a Borel set while M and V — M are not, it 
follows ((6], p. 83) that F and G must be empty. Hence A; = A; = A say, and 
C=A xV =g (A). It now follows ({6], p. 83) that A is in U, so that g (A) = 
C is in g ‘(U). Since C is arbitrary, we have S; G g'(U); but g ‘(U) = S,, 
so that S, ¢ S,. Thus S; = §,. 

To show that f is not a functional contraction of g, suppose to the contrary 
that f S g [S, uJ]. Then f & g [S, A], where § denotes the Lebesgue measurable 
sets of X and } is (planar) Lebesgue measure on §. In other words, there exists 
a function h on U into U, u U2 and an §-}-null set N such that f(z) = h(g(x)) 
on X — N. Write h-'(U,) = I, U x M = J, and I x V = K. ThenJ = f ‘(U;) 
and K = g ‘(h'(U))), and it follows exactly as in the proof of Lemma 7 that 
the sets J — K and K — J are §-}-null. Hence L = (J — K)u (K — J) is 
§-\-null. 

For each u ¢ U, let E,, be the set of all v ¢ V such that (u, v) e L. Let Xu, A» 
denote linear measure on U, V, respectively. It follows from Fubini’s theorem 
({6], p. 81) that there exists a C © U with i,(C) = 0 such that, foreach ue U — 
C, the set. E,, is \,-measurable (and of \,-measure zero). Since u ¢ J implies E, = 
V — M,and u e U — J implies E, = M, and since at least one of the sets ] — C, 
U — I — C must be nonempty (because X,(C) = 0), it follows that M is i,- 
measurable, and this is a contradiction. 

It can be shown by a slight elaboration of the preceding argument that in 
Example 2 we have §, ¢ §, [5, a], but not f & g [5, a]. This, together with 
Lemma 7, shows that by completing a given probability space (X, S, u) to 
(X, 5, z) the inconsistency between contraction and functional contraction is 
reduced but not eliminated entirely. 
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ESTIMATION OF PARAMETERS OF TRUNCATED OR CENSORED 
EXPONENTIAL DISTRIBUTIONS 


By Watrer L. Deemer, Jr. AND Davin F. Voraw, Jr. 


United States Air Force and Yale University 


1. Summary. This paper gives maximum likelihood estimators of parameters 
of truncated and censored exponential distributions, asymptotic variances of 
the estimators, and asymptotic confidence intervals for the parameters. 

Applications to bombing accuracy studies and to life testing are pointed out. 
As regards bombing accuracy the parameter estimated is the reciprocal of the 
variance in a normal bivariate distribution having circular symmetry. The 
reciprocal is estimated because there is no maximum likelihood estimator of 
the variance and any estimator of the variance is badly biased (see Section 2). 

Results of a synthetic sampling experiment are given to provide information 
on rapidity of convergence of the distributions of the estimators to their asymp- 
totic distributions. 


2. Introduction. In bombing accuracy studies and in other aiming accuracy 
studies, the assumption is often made that aiming errors (range and deflection 
errors in bombing; azimuth and elevation errors in gunnery) have a bivariate 
normal distribution with mean at the aiming point, zero correlation and equal 
variances. 

Under these assumptions the radial error, or distance from the aiming point 
to the point of impact, is a chance quantity say R with probability density 
function 


(2.1) k(r) = ro” exp [—r*/(2e°)], O<r< a. 


Let 3R” = Z, say, and denote o” by c. The density, say h(z), of Z is 


(2.2) h(z) = ce, 0O<z< a~;ce>0; 


thus Z has an exponential distribution. 

In some situations values of Z greater than a fixed value cannot be observed. 
For example, in gun camera missions the view angle of the camera defines the 
maximum observable R (and thus the maximum observable Z). An example 
arises in life testing from an exponential distribution when the time of testing 
is fixéd in advance (see [3], pp. 4-9). (Cases in which the time of testing is de- 
termined by a sample are treated in [1], [3], [4], and [6], p. 416.) 

Before proceeding with the estimation in truncated and censored cases let 
us consider estimation’ of c in (2.2) on the basis of a sample Z,, Z2,--- , Zw 


Received November 4, 1953; revised November 20, 1954. 
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of values of Z. The likelihood function, L(c), of c is 
(2.3) L(c) = {ce~*}* 


where 2 is the sample mean. The value, say ¢, of c for which L(c) assumes its 
maximum value is 


(2.4) é = (2)', the maximum likelihood estimator of c. 


The estimator ¢ has a finite mean if N = 2, and a finite variance if N 2 3. 

It is well known that 2Nc/é has a chi-square distribution with 2N degrees of 
freedom. Equation (2.4) is equivalent to the well-known result that the maxi- 
mum likelihood estimator, say 3°, of o’ is 


(2.5) o = > ri/2N. 


il 
The asymptotic variance of (N)'” (@ — c) is 
[— E(a log h(z)/dc’)|". 
From (2.2) we have that this equals c’; therefore, for large NV 
(2.6) Variance [(N)"? (¢ — ¢)] = e. 


Derivations of the asymptotic variance of a maximum likelihood estimator are 
given in [6], pp. 208-212, and [7], pp. 136-139. 

When the distribution is truncated or censored, we shall replace Z by X and 
denote by 2 the maximum value of X that can be observed. It is assumed 
that x9 is known in advance. The two cases will now be described. 

Case A (Censored’ Distribution). Here the number of observations greater than 
x» is known. When Z S x, X = Z; when Z > 2, the only information ob- 
tained about X is simply that X > x». X can be regarded as having a density, 
say g(x), when X S 2 ; thus 


—Ccz 


g(x) ce ’ 0<zrSN, 


Pr (X > m) =e. 


(2.7) 


Case B (Truncated Distribution). Here the number of observations greater 
than z» is unknown. X has a density, say f(x), which is the conditional density of 
Z given that Z S 2 ; thus 


(2.8) f(x) = ce (1 — &)", 0<zS%. 


The maximum likelihood estimator of c will be derived for Case A and for 
Case B. It is noteworthy that in each case no maximum likelihood estimator of 
o (= ') exists and the bias of any estimator of o° tends to — as o° tends 


to + ©. For this reason the quantity c instead of c’ is chosen as the parameter 
to be estimated. 


2 For further discussion of censored and truncated distributions see [2], p. 144. 
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3. Maximum-likelihood estimators. For Case A let n be the number of ob- 
servations of X such that X S 2» and let m be the number of observations 
such that X > x. Let N = m + n. The likelihood function, say L,(c), of c is 
(see (2.7)), 


Ni [mt nt}? c" exp [—c >. 2; — mcxl, n>0O 
(3.1) L,(c) = ‘ I , 


eo Nere “ 
(It should be noted that this is the likelihood function of a chance quantity 
having the density given in (2.7) and a probability e “° of taking the value 
xo. Halperin [3], pp. 4-9, has proved that the maximum likelihood estimator 
of ¢ in this mixed continuous discrete case has the properties of consistency, 
asymptotic normality, and minimum asymptotic variance.) 

The maximum-likelihood estimator, say ¢, , of c is 


(3.2) é, = nf me +4 > x. 


¢, has a finite mean if N = 2 and a finite variance if N 2 3. 
For Case B let the sample be X,,--- , X,. The likelihood function, say 
Lz(c), is (see (2.8)), 


Le(.) = ec (1 — &*)™ exp | -« = x| 
1 


= [oem — II”, 


where Z is the sample mean. It follows that 


(3.3) 


(3.4) 8 log La(c)/de = nfo — me “(1 — e **)* — Zi. 


It can be shown that the function c+ — xe ** (1 — e€ **)” is monotonic de- 
creasing in c; as c tends to 0 the function tends to 329 , and as ¢ tends to infinity 
the function tends to 0. When 0 < Z < 420, there exists a solution, say c’, of the 
equation formed by setting d log Lg(c)/dc equal to 0 (see (3.4)). Clearly c’ is the 
maximum likelihood estimator of c when 0 <  < 429. When 2 32», the func- 
tion Ls(c) assumes its maximum value for c = 0. The maximum likelihood 
estimator, say ¢g , of c can be described as follows: 

(c’, when 0 < & < 32 
(3.5) és = { 

\0, when 2 32. 
A table of Z/2o as a function of c’zo is given in Table 1. 

The estimator és is less than n’(>-}'2z,)~’, which is the estimator ¢ when 
n’' = N (see (2.4)). This follows from the fact that when n’ = N, La(c) = 
L(c)(1 — e**)~”’ (see (2.3), (3.3)). The estimator é, , therefore, has finite mean 
for n’ => 2 and finite variance for n’ = 3. 
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TABLE 1 


SS 
Xo 


cm et —] 


a 
x | 
3s | 


4 5 6 t 8 


| 4916 | 4832 | .4750 | 4668 | 4584 | 4504 | 4422) 4340 | 

| .4180 | .4102 | .4024 | .3046 | .3870| .3794| .3720| .3648| .3576 | .3504 
.3434 | .3366 | .3300 | .3234 | .3168| .3106| .3044| .2984| .2924| .2866 

| .2810 | .2754 | .2700 | . .2596 | .2546 | .2496 | .2450 | .2402 | .2358 
.2314 | .2270 | .2228 | .2188 | .2148| .2110/ .2072| .2036| .2000| .1966 
.1932 | .1900 | .1868 | . .1806 | .1778 | .1748 | .1720| .1694| .1668 
1642 | .1616 | .1592 | .1568| .1546| 1524) .1502| .1480| .1460| .1440 

| 1420 | .1400 | .1382 | .1364| .1346 | .1328 | 1310 | .1294 | .1278 | .1262 
1246 | .1232 | 1216 | 1202 | .1188 | .1174| 1160) .1148 | 1134 | .1122 





[pueeeenrs| 


4. Asymptotic variances of the estimators. With regard to Case A we have 


from results of Halperin [3], pp. 4-9, that the asymptotic variance of (N)'” 
(€4 — c) is the reciprocal of 


(4.1) pr | to at) | g(x) de + q [eee 


where g = Pr(X > xo) = e “° (see (2.7)). 
The expression in (4.1) equals 


(4.2) e*(1 — &); 
accordingly, for large N 
(4.3) Variance [(N)'?(é,4 — c)} = e(1 — e&*)™. 
Note that this is always greater than the asymptotic variance of (N)‘?(é — ¢ 
(see (2.6)). 

The asymptotic variance of (n’)"*(é, — c) is the reciprocal of 

— E(ad* log f(x)/ac’), 

where f(x) is given in (2.8). Thus for large n’ 
(4.4) Variance [(n’)'"(ég — c)] = [e°? — ae (1 — & 77". 


Having obtained the asymptotic variances of é, and é, let us compare them. 


The comparison will be made for n’ = N, which is the most favorable situation 
for Case B. Let 


A _ Variance [(n’)"”’ (és — c)] 
(4.5) ~ Variance [(n’)*” (é, — c)]’ ‘Dee 


From (4.3) and (4.4) it follows that 
(4.6) R= (1 — €)/(1 — (exe "(1 — &**) 
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TABLE 2 
Ratio of the Variances of ¢g and ¢, 


R(t) | t = cxo 





1194 
594 
294 
194 
144 
113 

54.6 
34.8 
24.9 
19.1 
15.3 
12.6 
10.7 
9.15 
7.97 


oof 
SEE 
Oe ee 


Coo fF WN RRR Re eRe Re 
Scocooem 


re te rt 00 89,9900 
SBPESRERSERR 


_ 
a 
— 





oe mnouark wh 


R can be considered as a function of cry = t, say. A table of R as a function 
of ¢ is given in Table 2. (R(t) > 1 fort > 0, and R(t) — ~ ast— 0.) 


5. Interval estimation of c. Approximate 100(1 — q) per cent confidence 
limits for c in (2.7) can be obtained by means of the following approximation 
when the sample size is large: 


(5.1) Pr(—yg KY < Ye) = 4 


where y, is the 100(1 — 4q) per cent point of the standard normal distribu- 
tion and 


(5.2) y = Ne — o)/[e(d — 7), 


Similarly, when the sample size is large, 100(1 — gq) per cent confidence limits 
for c in (2.8) can be obtained by means of (5.1), where 


(5.3) y= (n’)\* (és a, c){e cis re "(1 Ms qn *. 


The procedure given in [6], Section 11.7, for constructing confidence limits 
could be used in the cases discussed above. 


6. Synthetic sampling experiment. To throw some light on the rapidity of 
approach of the distributions of é, and é, to their limiting normal distributions 
we have carried out a synthetic sampling experiment. With regard to ¢ the 
rapidity of approach can be determined by analytic methods since the exact 
distribution of é is known (see Section 2). 

A random sample of 140,000 cases was drawn from a rectangular distribu- 
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TABLE 3 
Synthetic sampling experiment 


P is the probability that values of x? as large or larger than that obtained would 
have been obtained under the null hypothesis.* 


Serial No. of set) Number of | Number of 9 
of 20,000 cases \cases per sample samples - — 


P 


~ 
~ 








| 


e 


100 200 
100 200 
100 200 
100 
100 
100 
100 


| | ct 

| : | 62 
19 29.2 | .063 
| .32 .78 
.39 20.4 .32 
| 44 | | .00040 | 33.2 | .023 
| .18 | .12 27.2 10 


SESIRKES 
DPW AASWbd 


“1S ore GW to 


12.8 85 15.2 71 


* Equi-probability intervals (.05) were used throughout; thus there are 19 degrees of 
freedom. 


tion and randomly divided into seven sets of 20,000 cases each. Three of these 
seven sets were divided into 200 samples of 100 cases each; the other four sets 
were divided into 100 samples of 200 cases each. The variable with the rectangu- 
lar distribution was then converted (a) to a variable with density function as 
given in (2.2) with c = 1, and (b) to a variable with density function as given 
in (2.8) with c = 1 and x = 1. The variable of (a) was used to calculate é for 
each sample (600 samples of 100 cases each; 400 samples of 200 cases each); 
this distribution was then censored at x» = 1 and é, was calculated for each of 
the 1000 samples. The variable of (b) was used to calculate é, for each of the 
1000 samples. The goodness of fit of the limiting normal] distributions to the 
observed distributions of é¢, and é, was tested by chi-square. The goodness of 
fit of the exact distribution of ¢ to the observed distribution was tested simi- 
larly. The chi-square probabilities are given in Table 3. Each of the seven 
lines of Table 3 represents one of the seven independent sets of 20,000 cases. 
The three values of the chi-square probability, P, on a given line are not inde- 
pendent because they are based on the same samples. 

The results suggest that when czo is as small as 1 and the sample size is as 
small as 100, the distributions of the estimators are fairly well approximated 
by the limiting distributions. With less severe limitations (i.e., cro > 1) the 
approximation would be better. 
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ESTIMATION OF THE MEAN AND STANDARD DEVIATION BY ORDER 
STATISTICS. PART II 


By A. E. SARHAN 
University of North Carolina 


1. Introduction. In a previous paper [3], the best linear estimates of the mean 
and standard deviation for the rectangular, triangular, double exponential, 
and the exponential distributions were worked out. The best linear estimates 
were obtained by ranking the observations in ascending order and finding the 
best linear combination of them [2]. The variation of the coefficients in the 
estimates and the efficiencies of some other linear estimates were discussed. 

This paper—which is a continuation of the previous one [3]—deals with three 
distributions: a U-shaped, a parabolic, and a skewed one. The same items were 
worked out for these distributions as for those in the previous paper. Also, a 
general idea of the natural sequence of the coefficients in the best linear estimate 
of the mean as the shape of the distribution undergoes change will be considered. 

The mathematical formulae for this work will not be given as they are similar 
to those given in [3]. 


2. U-shaped population. The frequency distribution of a U-shaped popula- 
tion is 


» ar 
(2.1) fy) = 3Y ae ’ i4-h# Sys + 4 


where 6; is the mean and @, is half the range. Standardizing the variable we get 


(2.2) f(z) = %’, -lszrs+l. 


The coefficients a; in the best linear estimates of the mean are given in Table 
I such that 


(2.3) fj = 2. MY 5 


tel 


where y;,) is the ith ordered sample element. 
Since 


(2.4) V(y) = $65, 


we can estimate the standard deviation o by +/% 62 and the coefficients can 
be adjusted to give the best linear estimate of the standard deviation o*. These 
adjusted coefficients for which 


(2.5) o* = a Oni Yi) 
are also shown in Table I. 
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TABLE II 


Variances of the best linear estimates of the mean and standard deviation in 
different populations (o = 1) 
Variance of the estimate of 
Population and sample size 


mean standard deviation 


-6333333 
-5000000 
-5123457 
-5306132 
.57079 

-7777778 





. 2501299 . 2161616 
meeieioler a5 56.0. 48 aL . 3000000 . 2000000 
Parabolic .3208101 . 2220975 
.3293975 . 2414966 
. 3333333 .27548 

. 2947532 .4320999 


. 1279837 -0955036 
- 2000000 1111111 
. 2315500 . 1335981 
. 2443499 .1514217 
- 2500000 18005 

. 2077706 . 2986242 


.0675462 .0470213 
Rectangular . 1428371 -0714286 
Parabolic... .. . 1790064 .0925499 
.1934059 . 1079590 
. 2000000 . 13332 

. 1584266 . 2288250 








The variances of the estimates of the mean and standard deviation are given 
in Table II. Furthermore, the relative efficiencies of the sample mean, median, 
and the midrange as estimates of the population mean are shown in Table III. 
Similarly the relative efficiencies of the range, the normal estimate, and Gini’s 
estimate are also given in the same table. The efficiencies are calculated relative 
to the best linear estimate. 

Table I shows that the two extreme values in the estimate of the mean have 
large weights while the middle elements have negative weights. 

Comparing the efficiencies (Table III) of the estimates of the mean, we see 
that the midrange is more efficient than either the sample mean or the median. 
Again, the range as an estimate of standard deviation has a higher efficiency 
than either the normal or Gini’s estimate. So, the midrange and the range 
(which are based on the two extreme values) can be used to estimate the popula- 
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TABLE III 


Percentage efficiencies of certain estimates of the mean and siandard deviation 
relative to BLE, in different populations, from ordered samples of size n 
Estimates of the mean Estimates of standard deviation 

Population and sample size 

y R | N G 


100 | 100 100 100 100 
100 100 100 100 | 100 
100 | 100 100 100 | 100 


75.04 | 30.57 | 98.77 | 100 100 «= |:100 
96.24 | 60.25 | 98.83 | 100 100 = |-:100 
97.41 | 61.29 | 98.59 | 99.57 | 99.57 | 99.57 





51.19 | 23.95 | 97.16 | 86.34 90.99 
92.62 | 65.27 | 97.85 ; | 52.80 | 97.73 
95.91 | 68.59 | 97.61 | | 98.56 | 97.64 


33.77 | 9.36 95.39 ‘ | 77.50 | 69.27 
89.50 49.56 | 95.91 : | 96.03 | 91.49 
91.36 | 51.58 | 93.05 | ‘ 97.40 | 95.51 





Here g denotes the sample mean, g denotes median, w denotes the midrange, R denotes 
the range, N denotes the normal estimates, and G denotes the Gini’s mean difference. 


tion mean and standard deviation in this distribution for the sample sizes with- 
out great loss of accuracy. 


3. Parabolic population. The frequency distribution of a parabolic popula- 
tion is 


(3.1) i <P 6 — 4, Sy SO + dr, 


where 6; is the true mean and 6, is the range. Standardizing the variable we get 
(3.2) f(z) = 62(1 — 2), 0Os221. 


The coefficients a; in the best linear estimate of the mean (67) are given in 
Table I. 


Since 
(3.3) Viy) = Poh, 


we can estimate the standard deviation o* by (1/+/20)62 and the coefficients 
can be adjusted to give the best linear estimate of the standard deviation o*. 
These adjusted coefficients for which 


(3.4) o* = » 21 Vio 
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are given in Table I. The variances of the estimates of the mean and standard 
deviation are given in Table II. 

Table III gives the percentage efficiencies of the different estimates relative 
to the best linear estimate. In the best linear estimate of the mean we find 
that the extreme values have higher weights while the middle elements have 
smaller positive weights (decreasing towards the middle). 

For the given sample sizes, the midrange as an estimate of the population 
mean is shown to be more efficient than the sample mean (Table III), while the 
median has low efficiency. Furthermore, the range as an estimate of the standard 
deviation is more efficient than either the normal or the Gini’s estimate as 
shown in Table ITT. 


4. A skewed population. The frequency distribution of a skewed population is 


12 y— ‘( 1%) 20, 20. 
(41) 0:\ Os 9) en ee pe ae ee Rn 


where @; is the true mode and @ is the true range. Let 


(4.2) oats 
2 


to get 
(4.3) f(z) = 122°(1 — 2). 


Since the population mean is 6; — 62/15, and the population standard devia- 
tion is 62/5, then the coefficients can be adjusted to give the estimates for the 
mean » and the standard deviation o. These can be obtained from 


(4.4) u* = Of — 63/15, 
(4.5) o* = 63/5. 


The adjusted coefficients in the BLE of the mean u* and standard deviation 
o* are given in Table I. The efficiencies of the estimates are given in Table III. 

In this case, again, we find that the two extreme sample elements have the 
greatest numerical weights in the BLE while the other values have smaller 
weights. It is of interest to see that the least sample value (the extreme value 
on the side of the long tail) has a smaller coefficient than the largest sample 
value (the other extreme on the side of the shorter tail). This is to be expected 
since extreme values from the longer tail occur more often and tend to upset 
the estimate. It throws some light on the effect of the shape of the distribution 
or the length of its tails on the coefficients of the BLE. This is not the only 
relation, however, and the nature of the general relation is not yet well known. 

The midrange has a higher efficiency for the given sample sizes than that of 
the sample mean while the median has a lower efficiency. Again, the range 
has a higher efficiency than either the normal or the Gini’s estimate. 
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5. Coefficients in the BLE of the mean for symmetric distributions. We have 
seen in [3], and in the previous sections that the coefficients in the best linear 
estimate of the mean vary as the parent distribution undergoes change. It is of 
interest to notice the sequence of this variation. The sample elements may have 
equal weights or smaller weights at the middle than at the tail, or zero weights 
at the middle and equal weights at the extremes or large weights on the tails 
and negative weights in the middle. There is a sequence in which the middle 
elements are to be equally weighted, zero weighted, and negatively weighted.’ 
It seems that the full sequence is missing its natural extension and the complete 
sequence should read: 

(a) negative weights in the middle and large positive weights at the tails, 

(b) zero weights in the middle and equal weights at the tails, 

(c) less weights in the middle than at the tails, 

(d) equal weights throughout, 

(e) more weight in the center and less weights in tails, but all positive weights, 

(f) middle observations receive all the weight, others nothing, 

(g) middle observations receive more than unity and tails take on negative 
weights. 

This is the sequence which might be anticipated. The results show that (a) 
is U-shaped; (b) is rectangular; (c) is triangular or parabolic; (d) is normal; (e) 
is double exponential; (f) is the case where the median gets all the weight, which 
is like a double exponential but not exactly. For (g) the author does not know 
any example at this time, i.e., a distribution where it would be best to estimate 


the mean by giving the middle element a weight greater than one and to give the 
elements on the tails some negative weights. This represents, however, a natural 
continuity in the sequence. 


6. The variances of the best linear estimates. Table II gives the variances of 
the best linear estimates of the mean and standard deviation in different sym- 
metric distributions with o = 1. The variances of the estimates of the normal 
population are obtained from Tables 5 and 6 in [1] calculated to five decimal 
places. 

The table shows that the variance of the best linear estimate of the mean 
of a U-shaped population (for n > 2) is the least among the given distributions. 
This raises the theoretical problem of finding the distribution whose mean can 
be estimated with the least variance. 

The same table shows also that the variance of the estimate of the mean 
increases gradually from the case of the U-shaped distribution to the rectangu- 
lar, to the parabolic, to the triangular, and then to the normal. The variance of 
the estimate of the mean then decreases in the case of double exponential. 

As to the variance of the estimate of standard deviation, the same table 
shows that the variance of the estimate increases from the rectangular to the 


1 The author wishes to thank Professor Frederick Mosteller for directing his attention 
to this particular sequence. 
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parabolic, to the triangular, to the normal, and then to the double exponential. 
For the U-shaped distribution, the variance of the estimate is greater than that 
of the rectangular for n = 2 and 3. For n = 4 and 5, the variance becomes 
smaller than that of the rectangular. However, working out the estimates and 
their variances for n = 6 and 7, it has been found that the variance of the 
estimate of standard deviation for the U-shaped becomes progressively smaller 
than that of the rectangular. So it seems to the author that as n increases, the 
variance of the estimate of the standard deviation of the U-shaped distribution 
tends to be the least among the given distributions. 

The author wishes to acknowledge the kind help of Dr. B. G. Greenberg, 
under whose direction this work was done. 
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PROBABILITY OF INDECOMPOSABILITY OF A RANDOM 
MAPPING FUNCTION! 


By Leo Katz? 
Michigan State University 

Summary. Consider a finite set 2 of N points and a single-valued function 
f(x) on @ into Q. In case the mapping is one-to-one, it is a permutation of the 
points of 2; we shall be concerned with more general mappings. Any mapping 
function effects a decomposition of the set into disjoint, minimal, non-null in- 
variant subsets, as Q = w; + w. + --- + w,;, where f(w;) C w; and f ‘(w;) C w;. 
These subsets have been referred to as trees and as components of the mapping; 
we shall say that f, as above, decomposes the set into k components. 

Metropolis and Ulam [1] defined a random mapping by a uniform probability 
distribution over the * sample points of f(z) and posed the problem of finding 
the expected number of components. Kruskal [2] subsequently solved this 
problem. In this paper, we consider a related problem, namely, what is the 
probability that a random mapping is indecomposable, i.e., that the minimal 
non-null set w for which f(w) = w and f‘(w) = w, is the whole set w = 2? 

This problem is solved in general, as is, also, an analogous problem for a 
specialized random mapping of some interest in social psychology. Finally, we 
examine the asymptotic behavior of these probabilities. 


1. Indecomposability of a random mapping. A single-valued mapping specifies, 
for each point P; , its image point P;,, j; = 1,2,---,N (a point may map 
into itself). A random mapping assigns, independently, to each P; one of the 
image points P;, j = 1, 2,---, N, with equal probability 1/N. The sample 
space consists of the N” possible mappings, with uniform probability distribu- 
tion. To each mapping is associated a value of the random variable k, k = 
1, 2, --- , N, the number of components. Those for which k = 1 are indecom- 
posable. We shall require, first, a characterization of the property of indecom- 
posability, second, a disjunctive and exhaustive categorization of those map- 
pings which possess this property, and, finally, an enumeration scheme within 
each category. 

In order to obtain a suitable characterization of indecomposability, we con- 
sider that a single-valued mapping function takes any point of the (finite) 
set into a second, the second into a third, etc., until, at some stage, a point is 
taken into an earlier member of the sequence. At this stage, a cycle is formed; 
the length of the cycle is the number of repetitions of the mapping required to 


Received November 8, 1954. 
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take any point of the cycle into itself. No point of a cycle can be mapped on 
any point not of the cycle, but a point not of a cycle may map through a chained 
sequence into a point of the cycle. Thus, a component of a mapping consists of 
precisely one cycle, together with cycle-free chains terminating at points of the 
cycle. This provides the required characterization: 

CHARACTERIZATION. A mapping function is indecomposable if and only if it 
generates only one cycle. 

Next, we may categorize indecomposable mappings according to the length 
m of the cycle contained in them. Finally, we subcategorize m-cycle indecom- 
posable mappings into sets according as the noncyclic elements are arranged 
with n; requiring j stages to be mapped into the cycle, 7 = 1, 2, 3, --- . This 
subclassification corresponds to the nonzero, p-part, partitions of (NV — m), with 
p arbitrary. 

We now view the indecomposable mapping as a directed graph, more pre- 
cisely, as a tree rooted in an m-cycle. The directed joins, one emanating from 
each point, represent the mapping from point to image. In the following section, 
we shall consider that the graph of an indecomposable mapping consists of a 
central m-cycle, a first orbit of n; points connected by one-chains to the cycle, 
a second orbit of nz points connected by one-chains to the points of the first 
orbit and, hence, by two-chains to the points of the cycle, etc. An example of 
such an indecomposable mapping with m = 6, m = 4, nm. = 3 is given in Figure 


nee ae oe 


gD rains igs 


Fig. 1. Example of Mapping m = 6, m = 4, m = 3 


2. Probability that mapping is indecomposable. We proceed formally, at first, 
to express the probability of indecomposability as the sum of compound proba- 
bilities that the mapping produces exactly one cycle and the cycle is of length 
m. These, in turn, are expressed as the sums of probabilities that the remaining 
M = N — mare arranged in nonempty orbits of m , n2, --- , nN», respectively, 
for all possible such arrangements. It is convenient to give special treatment to 
the number m in the cycle itself. Consider the event E,,.(m , n2,--* , Np) that 
a random mapping is indecomposable with parameters m, m,---, n>. The 
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probability of this event is the curiously linked expression 


P{E,(m, ne, a » Np) } 


(1) ict ( N ) (m — 1)! (sy (H)” Ht toh 
~ \m,m, +++, Mp Nm \N N Pip? 


where (;,..4....) is the multinomial coefficient. The factorial in the second factor 

of the right member represents the number of distinct cyclical arrangements 

possible among the m points of the inner cycle; succeeding factors represent the 

possibilities of joins of points in an orbit to points of the next interior orbit. 
With slight rearrangement of (1), the probability we seek is given by 


N n n 
i bie N! 1 mn,’ - 
2 Pr{indecomposability} = < — 
(2) Prfindecomposaility} = STE {S So Eee 
where [1], stands for the collection of nonempty, p-part partitions 


(ny, Ne, rr » Np) 
of M. 
We now evaluate the expression in braces in (2) by the following lemma. 
LEMMA. 
1 m™'nt? ---n521 _ NM" 


> (Mlp™m ™%!ne!--- n,! M! 


where N = M + mand [M| is the class of nonzero distinct partitions (mn, , «++ , Np) 


of M. 
Proor.® We proceed indirectly by expanding the binomial in the right mem- 
ber as 


(M + m)“~ bs o M = ') 1 nm ,—1 M—n, 


or, letting M, = M — n, and simplifying, 


(M+ m)*— _¥ a om" (Mm, + m)“** 
M! n= (m — 1)! M;! 


We note that the second factor in the summand of (3a) is of the same type 
as the left member and may be similarly expanded. Letting M; = Mi. — nj, 
t = 2,3, --- , we obtain, by iteration of (3a), 


(M + m)™— M m" Mi ni? Mp—1 nz as 
(agp LR Ss PR ch RL th 
M! ni=l (my - 1)! no=l (ne a 1)! Np=1 (np * 1)! 


with p arbitrary. But the summations in the right member are equivalent to 
the sum over all p and nonzero p-part partitions of M and the summand is 
that of the lemma, thus proving the lemma. 


(3a) 


’ This short proof is partly due to J. S. Frame. 
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The lemma and (2), upon changing the index of summation to M = N — m, 
gives immediately the principal theorem: 


TuHEorEeM. The probability that a random mapping on N points is indecom- 
posable is 


\((N — 1)!/N*] > N™/M1. 


3. Hollow mapping. One realization of near-random mapping occurs in socio- 
metric testing. When, for example, N individuals in a group are each asked to in- 
dicate which one of the others is his best source of information, the result is such 
a mapping except that, if no individual is permitted to name himself, the mapping 
is “‘hollow”’ in the sense that the matrix representation of the graph has diagonal 
elements identically vanishing. If, otherwise, selection is random, the probability 
of equation (1) is modified for this case by replacing each N in the denominator 
by (N-1) and taking the outer summation of equation (2) from m = 2 tom = 
N. With these adjustments, we have the following corollary. 


Coro.Luary. The probability that a hollow random mapping on N points is 
indecomposable is 


[((N — 1)!/(N — 1)*] > N™/M1. 


4. Computation and asymptotic probability. The probabilities of indecom- 
posability, of the theorem and the corollary above, might be expressed in more 
compact form. However, as exhibited, it is apparent that the sum is a cumulative 
probability of a Poisson variable with parameter N, except for a constant. 


Molina’s tables [3] are adequate for computation of the probability through N = 
100. Thus, 


(N — 1)! 


(5) Pr{indecomposability} = 77 


e’ P(N; N — 1), 


where P(N; N — 1) = v0 e *N“/M1. For N > 100, use of the Stirling 
approximation for the factorial and the facts that (1 — 1/N)"~"” = e* + O(N”) 
and that P(N; N — 1) — 3, we obtain 


1/2 
(6) Pr{indecomposability} = (*) , N large. 


Similarly, using the corollary, we have 


(N — 1)! 


(5h)  Pr{indecomposability | hollow} = (N — 1)¥ 


e” P(N; N — 2), 


1/2 
° oye Tv 
(6h) Pr{indecomposability | hollow} = (sav) , WN large. 
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TABLE I 


Probabilities of Indecomposability of a random mapping function in the general 
and hollow cases 


P{IjG} | P{I\H} } P{I|G} P{I| H} 





1.00000 .23372 .54135 
1.00000 . 22562 .52574 
.96296 .21831 51148 
92188 .21169 .49837 
.88320 . 20564 .48628 
.84816 . 20009 .47507 
.81671 .19497 46463 
.78844 .19023 .45488 
.76294 .17976 .43308 
.73983 .17086 .41426 
.71878 .16318 .39780 
.69950 .15646 .38322 
.68176 . 15052 
66539 .14521 
.65019 .14043 
.63605 .13610 
.62284 .13215 
. 12853 
.12519 
.12210 .30626 
.55853 Large N (/2N)*2 | e(r/2(N — 1))¥* 











-59885 
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The most interesting feature of this last result is that the probability in the 
hollow case remains substantially larger than in the general case as N increases. 
This runs counter to standard sociometric folklore, which holds that the hollow 
model may be uniformly replaced by the general model with small error when 
N is large. 

Both probabilities approach zero fairly slowly (as N~“”). Table I presents the 
exact probabilities as computed from (5) and (5a). 


5. Notes on related work. After the present paper had been prepared, David 
Blackwell called the attention of the author to an unpublished memorandum by 
Rubin and Sitgreaves [4]. In the memorandum, different methods are used to 
obtain the theorem of Section 2 of this paper; the hollow mapping case is not 
considered. 

Using methods of this paper, Jay E. Folkert and the author have obtained and 
will publish the probability distributions of the numbeis of components of single- 
valued and of multiple-valued mappings in the subcases in which mapping is 
arbitrary or hollow. The distribution for the single-valued, arbitrary case is given 
also in the memorandum cited above. 


4 The author is indebted to Mr. William L. Harkness for these computations. 
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A NECESSARY AND SUFFICIENT CONDITION FOR ADMISSIBILITY 


By CHARLES STEIN 
Stanford University 


1. Summary. In Section 2 we give the usual definition for admissibility of a 
strategy in a two-person zero-sum game, and obtain a simple sufficient condition 
for admissibility of a strategy for the second player which is hardly more than 
a formal statement of a procedure frequently used in proving admissibility. 
In Section 3 we introduce the notion of strict admissibility, which is slightly 
stronger than admissibility, but equivalent to it in the case where the space 
of strategies of the second player is weakly compact in the sense of Wald. We 
then obtain a necessary and sufficient condition for strict admissibility, in the 
form of a condition on the upper values of a sequence of games associated with 
the original game. In Section 4 we show that, under the additional condition 
that the minimax theorem holds for certain associated games, the condition of 
Section 2 is necessary as well as sufficient. The results have a formal resemblance 
to those of Hodges and Lehmann [4]. 


2. Introduction. Let A and B be sets and K a real-valued function on A x B 
such that for every ae A 


(1) p(a) = inf K(a,b) > —o. 


Following von Neumann [1], we refer to the triple (A, B, K) as a two-person 
zero-sum game, having in mind the situation where the first player chooses 
an element a of A, the second player an element b of B, the two choices being 
made simultaneously, and then the second player pays the first player the 
amount K(a, b). The set B is partially ordered by the relation < where b; X be 
means that, for every a e A, 


(2) K(a, bi) s K(a, be). 

If this holds we say that b, is better than b, . If, for all a, 

(3) K(a, b;) = K(a, be), 

we write b, = b, and say that b, is equivalent to b, . If by S by but not b, be, 


we write b; < b. and say that b, is strictly better than b.. We say that b; is ad- 
missible if there exists no b, strictly better than b; . 
We shall need a few more definitions before we can indicate the principal 


result of this paper. The strategy }, is said to be e-Bayes with respect to a « A if 
(4) K(a, b:) & inf, K(a, b) + «, 


Received November 17, 1954. 
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and to be Bayes if this holds for « = 0. If @ is a o-algebra of subsets of A such 
that for each b, K(-, b) is @ measurable, and & is a convex set of probability 
measures on @, including at least all those measures, denoted by [a], concen- 
trated at a single point a ¢ A, then the game (=, B, K’) with 


(5) K'(¢,0) = [ K(a,b) ax(a) 


will be called a convex extension of K. In order to make sure that this integral 
is defined we must make an additional assumption on K, and we shall assume 
K bounded below. A reasonable alternative might be the condition symmetric 
to (1), that is, 


(6) supa K(a, b) < for all b. 


THEOREM 1. If b; is such that for every a; ¢ A and « > O there exists § eZ and 
5 > 0 such that b, is «-Bayes with respect to (1 — 5) + d{a)], then b, is admis- 
sible. 

Proor. Suppose }; is not admissible. Then there must exist b. which is strictly 
better than b, , that is, 


(7) K(a, b:) S K(a, bi) 


for all a with strict inequality for some a, say a,;. By assumption, there exists 
6 > O and é € = such that 


K’((1 — 8)& + dfa,], b}) S inf, K’((1 — 5)E + S[ay], 6) + 6 
K'((1 — 8)& + d[ay], be) + 
s (1 —— 5) K(é, bi) + 6K (a, ? be) + €6 


so that K(a,, b;) S K(a,, be) + «. Since ¢ is arbitrary, K(a, , b:) S K(a,, be), 
which contradicts the hypothesis that (7) holds with strict inequality at a, . 

This theorem essentially follows the reasoning used by Blyth [2] and other 
authors in proving admissibility. In Section 4, assuming weak compactness of 
B in the sense of Wald [3], and assuming the minimax theorem to hold for a 
class of games associated with K’, we shall show that this condition is also 
necessary. The set B is said to be weakly compact with respect to K in the sense 
of Wald if, for every sequence {b;}, there exists bo and a subsequence {b;;} 
such that 


(8) lim K(a, b:,) 2 K(a, bi). 
jae 

We observe that, by Fatou’s Lemma, if B is weakly compact with respect to 
K, and K is bounded below, then B is weakly compact with respect to K’. 

The necessity of the condition of Theorem 1 could perhaps be proved more 
quickly without the intermediate results of Section 3. However, the necessary 
and sufficient condition, valid under much weaker conditions which we obtain 
there, is likely to be of some interest. 
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3. A necessary and sufficient condition for admissibility. In this section we 
shall use the notation and assumptions of Section 2 through (2.4), and also 
the definition of weak compactness (2.8). We shall also need the notion of 
strict admissibility, slightly stronger than admissibility. The strategy }, is 
said to be strictly admissible if for every a, ¢ A and « > 0, there exists 6 > 0 
such that, for every b for which K(a,, b) S K(a,, b;) — e¢, there exists a such 
that K(a, b) = K(a, b:) + 6. It is clear that strict admissibility implies admis- 
sibility. 

THEOREM 2. If bo is admissible and B is weakly compact with respect to K, then 
bo is strictly admissible. 

Proor. Suppose B is weakly compact in the sense of Wald and bp is not strictly 
admissible. Then for some a) ¢ A and some ¢« > 0 there exists a sequence {b;} 
such that 


(1) K(a , b:) S K(a, bo) — « for alli = 1,2, ---, 
(2) lim sup [K(a, 6;) — K(a, bo)] S 0. 
to aeA 


By the assumption of weak compactness there exists a subsequence {b;,} and an 
element b’ such that 


(3) lim K(a, b;;) 2 K(a,b’) forall a. 


J-°0 


It follows that 
sup |K(a, b’) — K(a, bo)] S sup [lim K(a, b;,) — K(a, bo)] 
acA = 


aA j--0 


(4) = sup lim [K(a, b;;) — K(a, bo)] 
aA j--@ 
< lim sup [K(a, b;,) — K(a, bo)| < 0. 
Similarly, 
(5) K (a , 6’) S K(ao, bo) — «, 


so that 0’ is strictly better than bp . Thus by is not admissible. 
THEOREM 3. In order that bo be strictly admissible, it is necessary and sufficient 
that for every a 


(6) lim inf sup {K(a,b) — K(ao, bo) + y[K(a, b) — K(a, bo)]} = 0. 


In order to simplify the writing we assume without essential loss of generality 
that, for all a, K{a, b)) = 0. Then (6) becomes 


(7) lim inf sup {K(a,b) + yK(a, b)} 2 0. 


you }b a 


Also a) may be taken as fixed throughout the proof. 
Proor oF NEcEssity. Suppose bo strictly admissible and let 6, be the 6 whose 








ADMISSIBILITY 521 


existence is asserted in the definition of strict admissibility. Let S, be the set 
of all b such that 
(8) K(a),b) < —e 
and S’ its complement. Then 

lim inf sup [K(ao, b) + yK(a, b)] 

= lim min { inf [K(ao,b) + ¥ sup K(a, b)], 
(9) jinf [K(qo, b) + 7 sup K (a, b)]} 

= lim min {p(a@) + ya, —e+y7 inf sup K(a, b)} = —e. 

The last step follows from the fact that, by the admissibility of bo , for every b 


there exists a such that K(a, b) 2 0, so that 
inf sup K(a, b) 2 0. 
6 a 


Since « was arbitrary, this completes the proof of necessity. 
Proor oF SuFFICIENCY. Supposing that (7) holds we have for every « > 0, 


0 < lim inf sup [K (a, b) + yK(a, b)] 
yoru 5b a 
< lim inf sup [K (ao, b) + yK(ao, })] 


you beS, a 


(10) < lim [sup K(a, b) + y inf sup K(a, b)] 
beS, a 


you beS, 


= lim |—e + 7 inf sup K(a, b)). 
beS, @ 


yrs 
Consequently, there exists y. > 0 such that 
(11) —ke S —e + y. inf sup K(a, b). 
b8, a 


Thus the definition of strict admissibility is satisfied with 6 = }«/y.. 

The proof shows that (6) could have been stated with lim replaced by lim or 
by lim, or with 2 replaced by =, or both. 

Coro.uary. If (6) holds for all ao , then bo is admissible. If B is weakly compact, 
then the converse holds. 

This is an immediate consequence of Theorems 2 and 3. 


4. Admissibility in the presence of the minimax theorem. In this section, 
we suppose K is bounded below and possesses a convex extension (=, B, K’) 
as described around (2.5). We shall also suppose the minimax theorem applies 
in (2.6) when K is replaced by K’, that is, 


(1) inf sup {K(ao , b) — K(ao, bo) + y[K’(é, b) — K’(E, bo)} 


= sup inf {K(ao, b) — K(ao, bo) + y[K’(é, b) — K’(&, bo)}. 
t »b 
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THEoreEM 4. Under the above conditions, in order that by be admissible it is neces- 
sary and sufficient that for every ao and every « > O there exist & € Z and 6 > 0 such 
that bo is «5-Bayes with respect to (1 — 5)& + d{ap]. 

Proor. Using the fact that 


(2) ~~ [K'(é, 6) — K'€é, bo)| = sup [K(a, 6) — K(a, bo)] 


together with (1), we find that (3.6) is equivalent to 
(3) lim sup inf {K(ao, b) — K(ao, bo) + yIK’(é, b) — K’(E, bo)]} 2 O. 
yr 


If we let 5 = 1/(y + 1) and use the fact that 


4) -2w'G) + Le K'(lad,b) = K (2 E+ + tad,b), 


we find that (3) is equivalent to 
(5) lim = sup inf (K'((1 — 8) + daa b) — K'((1 — 0) + daa, bo)] & 0. 


This is equivalent to the assertion that for every « > 0 there exist 6 > 0 and 
£, e Z such that 


(6) inf, K’((1 — 6)& + [ao], 6) = K’((1 — 8)k: + dfao], bo) — od. 
The theorem follows immediately. 
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Note added in proof. I believe this theorem to be potentially useful, but cannot 
now give any non-trivial examples. Attempts to apply the sufficiency often run 
into analytic difficulties. The necessity was useful heuristically in the recogni- 
tion of the inadmissibility of the usual estimate of the mean of a multivariate 
normal distribution of dimension greater than or equal to 3. (Abstract in Ann. 
Math. Stat., Vol. 26 (1955), p. 157; to appear in the Proceedings of the Third 
Berkeley Symposium on Mathematical Statistics and Probability). A result similar 
to Theorem 4 has been obtained independently by LeCam. 
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A BIVARIATE SIGN TEST 


By J. L. Hopasgs, Jr. 
University of California, Berkeley 


1. Introduction. The sign test has proved to be a very useful means for judging 
the significance of treatments. Suppose that on each of nm individuals (or pairs 
of individuals) measurements are made under two conditions, for example, 
before and after treatment (or on a treated and a control subject). Denote the 
two measurements for the ith individual (or pair of individuals) by x; and 2; . 
We formulate the null hypothesis that z; and z; are identically and independently 
distributed, but wish to make no assumption concerning relations between the 
distributions of x, 22, -**, Zn, nor concerning relations between those of 
21,22, °°: , 2m, Save that each set is independent. The alternative to the null 
hypothesis is that the second measurements z; are generally shifted, with 
respect to the first measurements zx; , in the same direction for all (or most) of 
the individuals. The test is carried out by counting the number S of the differ- 
ences x; — 2; which have positive signs. Under the null hypothesis, S is bi- 
nomially distributed with p = 4, assuming there are no cases with x; = 2,, 
or that such cases of equality are broken randomly. Under the alternative, S 
would tend to have large values if the second measurements are generally in- 
creased relative to the first, small values if they are decreased. We may then 
reject for large S, small S, or either, according to the alternative against which 
we wish the test to have power. The great advantage of the test, aside from its 
simplicity, is the generality of the conditions under which it is valid. 

The present paper proposes a bivariate analog of the two-sided sign test, 
which can be applied when two quantities are measured on each individual. 
We now have measurements z; and y; in a first circumstance, x; and y; in a 
second. Do the 4n measurements justify our concluding that the two circum- 
stances differ? The null hypotheses is that the bivariate distribution for (x; , y;) 
is identical with that for (x; , y;), and that these vectors are independent. The 
alternative of interest is that in the second circumstance the bivariate distribu- 
tion has been shifted relative to the first, in generally the same direction for all 
individuals. The direction of this possible shift is, however, unknown. 

To illustrate, suppose we measure blood pressure and blood sugar before and 
after treatment with a new drug on a number of individuals. We wish to know 
whether the drug influences these quantities, but have no preconceived notion 
concerning the direction or relative amount of the influence on either quantity, 
should it exist. The joint distribution of the quantities has an unknown form, 
and is presumably different in different individuals. The quantities are pre- 
sumably dependent, but in an unknown way. 

If we knew the direction of a possible shift, it would be easy to reduce our 
problem to the sign test. We could simply project the vectors of differences 
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(vx; — 2, yi — ys) onto the given direction, and count the number S of pro- 
jected vectors having the given sense. Our problem arises just because we do not 
have a given direction, but must derive one from the data. 

The idea of the proposed test is to consider all possible directions, and calcu- 
late S for each. Let M be the maximum of the values thus calculated. We shall 
use M as our test statistic, rejecting the null hypothesis if M is too large. That is, 
we shall judge that a shift has occurred if there is some direction in which most 
of the measurement pairs have shifted; we shall judge that no shift has occurred 
if the shifts are in various directions with no heavy concentration. 

The distribution theory for M under the null hypothesis is worked out in the 
following sections. Presumably it would be desirable to generalize the proposed 
test to more than two quantities. The multivariate analog of the statistic M is 
easily seen, though in more than three dimensions it would be difficult to com- 
pute M from the sample, and its null distribution might be troublesome. 


2. Reduction to a combinatorial problem. We shall suppose that none of the 
n vectors (x; — 2, Yi — y;) lies on the same line, and take the n lines on which 
these vectors lie as given, with all probability calculations conditional on the 
given lines. Under the null hypothesis, the distribution of (x; — 2;, y; — ys) 
is the same as that of (7; — z;, y; — y;), so that there is probability } for the 
ith vector to be oriented in each of its two possible senses. As the n vectors are 
independent, we conclude that the 2” possible orientations of the vectors are all 
equally likely. 

It is easily seen that the value of M for a given set of orientations is independ- 
ent of the angles between the lines and of the lengths of the vectors. Therefore, 
for simplicity we may suppose that the lines are equally spaced and the vectors 
all are of unit length. We imagine a circle on whose circumference 2n equally 
spaced loci are given. We are to distribute n plus signs and n minus signs among 
these loci, subject to the condition that diametrically opposed signs are opposite 
in sense. We shall call such an arrangement a cycle. We think of a cycle as being 
rotatable. about its center into 2n positions, each being itself a cycle. For each 
position we count the number s of positive signs among the n uppermost signs; 
m is the maximum of the 2n values of s thus obtained. Our problem is to count 
the cycles having a given value of m. 

It is clear that 4n S m S n. We shall denote n — m by k; thus k is the smallest 
number of minus signs which can be uppermost. The operation of rotation 
carries one cycle into another, generating equivalence classes of cycles. The 
largest possible class has 2n members. Smaller classes are possible, since there 
may exist cycles which are carried into themselves by a rotation through r 
positions, 0 < r < 2n. However, the smallest such r must be of the form re = 2n 
where 3 S c is odd (since opposite signs are of opposite sense); thus cycles in an 
equivalence class smaller than 2n will have k 2 n/3. As our interest is primarily 
in the tail of the distribution (k small), we shall simplify by restricting k < n/3, 
whence we can assume every class to have 2n members. 
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To count the classes, we shall select from each class a representative member, 
called the pattern for the class. This member is the unique one which satisfies 
two conditions, which can be expressed in terms of the n uppermost signs. 
These signs are arranged in a semi-circle, and we are particularly interested in 
the signs forming a consecutive set of fewer than n signs at either extreme of the 
semi-circle; we call such a set a (right or left) éail. The two conditions are: 

(a) There is no right tail in which there is a majority of minus signs. 

(b) There is no left tail in which the plus signs are not in the majority. 

The conditions serve to insure that the pattern has the maximum number m 
of positive signs uppermost; if it were possible to rotate it into a position with 
more positive signs uppermost, there would have to be a tail with a majority 
of minus signs. The conditions also insure that only one pattern is selected from 
each class; if there were two representatives of the class, (i.e., a cycle appearing 
in two positions) one of these would contradict condition (a). In general it is not 
true that every class has a member satisfying these conditions (consider a cycle 
with alternating signs), but it is true under the restriction k < n/3. 


3. Counting the patterns. We may obtain a formula for the number P(n, k) 
of patterns most easily by identifying our problem with the classical problem 
of gambler’s ruin. A pattern, read from right to left, may be interpreted as the 
record of a penny tossing game in which a gambler with initial capital h = n — 2k, 
playing against an adversary with unit initial capital, is ruined at the nth toss. 
The probability of such ruin is on the one hand P(n, k)/2"; but on the other 


hand formulae for it are well known (see, for example, [1], p. 304, problem 6). 
In fact, 


(1) P(n,k) = (wa + Wenge + Weng + ++) — (Wage + Warns + Wenge + ++), 


where h = n — 2k, and 
wa fut) 
* n\h(n — 2) 


is the number of ways in which a gambler with initial capital z can be ruined 
at the nth toss when playing against an infinitely rich adversary. 

If we take advantage once more of the restriction k < n/3, only two terms 
of (1) differ from zero, so that 


» _m—2ki(n\ n—2k+2 n 
@) EM Ta og (i) nat?" ,): 


Let Q(n, k) denote the number of patterns with at most k minus signs upper- 
most. Summing (2) we obtain 


aun = SST) = ("e) R=): 


Recalling that there are 2n cycles for each pattern and 2” cycles in all, while 
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under the null hypothesis these 2” cycles are equally likely, we find 


Pr {K < k} = (n — 2k) (t) /2”. 


The table gives values of Pr {K < k} to 5D for n = 1(1)30, and k < n/3. 


Table of Pr {K S k}, fork < n/3. 
k 





-04805 . ; . 72450 
-03044 . .59143 


01903. es 46575 
01175. .35578 .65057 
00718. 26474 .52605 
.00066 .00434 . . .19254 .41259 
.00038 .00260 . ‘ .13723 .31517 .58020 


.00021 .00155 . .09606 .23525 .46559 

.00012  .00092 . BS .06616 .17202 .36390 

.00007  .00054 . E 04491 .12350 .27789 .51490 

00004 .00031 .00186 | . .03008 .08722 .20786 .41055 
.00002 .00018 .00112| . .01991 .06067 .15263 .31987 


SSRN KRESS 








Although Pr {K < n/3} tends to 0 as n— ~, it does not fall below 5 per- 
cent until n = 72, or below 1 percent until n = 102. If the test proves useful, it 
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may be desirable to consider the distribution of K for k 2 n/3, where the re- 
sults are likely to be less simple and neat. 
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ON THE CONVERGENCE OF EMPIRIC DISTRIBUTION FUNCTIONS! 


By J. R. Buum 
Indiana University 


1. Summary. Let u be a probability measure on the Borel sets of k-dimensional 
Euclidean space E,. Let {X,}, n = 1, 2, --- , be a sequence of k-dimensional 
independent random vectors, distributed according to yu. For each n = 1, 2, --- 
let u, be the empiric distribution function corresponding to X,, --- , Xn, i.e., 
for every Borel set A ¢ EF, , we define u,(A) to be the proportion of observa- 
tions among X,, --- , X, which fall in A. 

Let @ be the class of Borel sets in EF, defined below. The object of this paper 
is to prove that P{lim,.,, supaeq |un(A) — w(A)| = 0} = 1. 


2. Introduction. Let F(x) be a distribution function on the real line and let 
{X,},n = 1, 2, --- , be a sequence of independent random variables distributed 
according to F. For each n = 1, 2, --- let F,(x) be the empiric distribution 
function corresponding to X,, --- , X,. The well-known theorem of Glivenko- 
Cantelli (see, e.g., Fréchet [1]) states that 

P{lim sup |F,(x) — F(x)| = 0} = 1. 
no —wcr<o 

Fortet and Mourier [2] have proved several theorems on the convergence of 
empiric distribution functions in a separable metric space Z. In particular, they 
show that if Z is a Euclidean space and u is a probability measure on EF which is 
absolutely continuous with respect to Lebesgue measure, then 


(2.1) P{lim o- lun(A) — w(A)| = 0} = 1, 


where @ is the collection of open half-spaces in EZ. Wolfowitz [3] proved that 
(2.1) holds without any assumptions on uy. In this note we prove that if u is 
absolutely continuous with respect to Lebesgue measure, then (2.1) holds for a 
considerably more general class of sets. 

To avoid repetition we shall assume from now on that every set considered is 
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a Borel subset of E,. Let @, be the class of sets A each of which possesses the 
following property. If x = (m,, --- ,a%) ¢Aandy = (m, --- , y) is such that 
yi < a fori = 1, ---,k, theny ¢ A. Let @;,7 = 2, --- , 2, be the 2 — 1 
classes of sets which can be obtained by reversing, one at a time, the k inequalities 
occurring in the definition of @,. Let @ = Un. @; . In this note we shall prove 
the following theorem. 

THEOREM. [f u is absolutely continuous with respect to Lebesgue measure, then 


P{lim 4 |un(A) — w(A)| = 0} = 1. 


3. Proof of the theorem. In proving the theorem we shall restrict ourselves 
to the class @, . The method of proof also applies to each of the classes @,, --- , 
@ , and consequently, from elementary considerations, the theorem holds for @. 

The method of proof depends on the following lemma. 

Lemna 1. Let @ be a class of sets and suppose for each p > 0 there exists a finite 
class of sets ®(p) such that for each B ¢ @ there exist sets B, and B, in @(p) satisfying 


i) B,C BC Bs, 
ii) u(B2) — u(Bi) S p. 


Then P{lim,.,, SUPse@ |un(B) — w(B)| = 0} = 1. 

The proof of the lemma is a direct consequence of the strong law of large 
numbers and is omitted here. 

In proving the theorem we shall assume that k = 2. It will be clear from the 
sequel that the method of proof applies to arbitrary k, although the details 
become vastly more complicated. 

Let R be a closed square in the plane which is subdivided into m* subsquares 
of equal area by dividing each side into m equal length intervals. Let @,(R) 
be the class of sets of the form A n R, with A e @,. For each set T ¢ @,(R) let 
B(T) be the set of boundary points of 7 with the exception of those lying on the 
south and west boundaries of R. If x = (a, x2) ¢ R, we shall say that z lies in 
a subsquare if it lies in the interior or on the north or east boundary of the 
subsquare. Let N(7') be the number of distinct subsquares in which the points 
of B(T) lie. Then we have the following lemma. 

LemMa 2. For every T ¢ @(R), N(T) S 2m — 1. 

Proor. We may assume that the coordinates of the corners of the subsquares 
are of the form (7, j) with i = 0, --- , m;7 = 0, --- , m. Now consider the 
2m — 1 lines of the form f(z) = 2 + k, withk = —m+1, —m+2,---,m— 1. 
By identifying each subsquare with the coordinates of its northeast corner it is 
easily seen that through each subsquare passes one and only one of these lines. 
Let T € @,(R). We shall show that on every line of the form f(x) = x + k there 
lies at most one point of B(7). For suppose x = (2 , x2) and y = (y:, ye) are 
two distinct points of B(T), and both lying on a line f(x) = x + k. Assume that 
x; < yi, t = 1, 2. Then we can find a point z = (2, z) ¢ T, with x; < z;, 
1 = 1, 2. But this contradicts the fact that x ¢ B(T). From this it follows that 
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each line f(x) = x + k passes through at most one subsquare containing points 
of B(T'). Since there are 2m — 1 such lines the lemma follows. 

Let (7: , j:) be the coordinates of a corner of a subsquare with either i, = 0 
or j; = m, and let (7, j2) be the coordinates of another corner of a subsquare 
with either 72 = m or je = 0. By a path P in R we shall mean a linear continuum 
of points connecting (7; , j:) with (t , j2) and satisfying in addition: 

i) Every point x = (2, 22) € P lies on the boundary of a subsquare of R. 

li) Ifx = (a, 2) e Pandy = (wm, ye) ¢ P, and if x, S y,, then z, = yz. 

By induction on m it is easily verified that there are at most finitely many 
paths P in R. To each path P we associate two sets T;(P) and T.(P) in @,(R) 
with B(T,;) = B(T:) = P and such that 7;(P) contains all points of P and 
T:(P) contains no points of P. Let @;,,(R) be the class of all sets obtained in 
this manner for all possible paths P. Then @;,,, (R) is clearly a finite class of 
sets for each integer m. 

Let T ¢ @,(R), and let p be a positive number. For any positive integer m 
we may then choose two sets 7 and 7; in @;,,(R) such that T; C T C T:, 
and such that if 7’ and 7” are in @,,,(R) and if 7, Cc T’ C TCT” CT, 
then 7, = T’ and T, = T”. From the choice of 7; and T; it is clear that T, — 7; 
is contained in the set of subsquares which contain B(T). Let L(U) be the 
Lebesque measure of a set U. Then from Lemma 2 it follows that L(T, — T;) S 
L(R) N(T) / m® < L(R) (2m — 1) /m?’. Since yu is absolutely continuous with 
respect to L, we may choose an integer m such that u(T; — T,) < p. Applying 
Lemma | we obtain the following lemma. 

Lemma 3. P{lim, .,. SUPre@,cry |un(T) — u(T)| = 0} = 1. 

Let p be a positive number. Let A e¢ @, , and let R be a square with u(R) > 
1 — p/4. Write A = A, u Az, where A; = AnR, Az = AnR, and where R 
is the complement of R. By virtue of Lemma 3 it suffices to show that 
lim, +, SUP, \Mn(A2) — w(A2)| = 0 on a set of sample sequences of probability 
one. Now consider any sample sequence in the set of probability one for which 
lim, + un(R) = w(R). Choose n so large that u.(R) < p/2. Since 0 S p,(A2) S 
un(R) and 0 S p(A.) S pw(R), we obtain |u,(A») — u(As)| < p/2, uniformly in 
A, C R, and the proof of the theorem is complete. 

It appears to be a reasonable conjecture that the theorem is true without 
the condition of absolute continuity. One can easily construct examples which 
show that the method used in this note will no longer apply in the general 
situation. It would be of some interest to extend the result to the general case. 
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CALCULATION OF EXACT SAMPLING DISTRIBUTION OF RANGES 
FROM A DISCRETE POPULATION’ 


By Irvine W. Burr 
Purdue University 


1. Introduction. The exact sampling distribution for ranges is known for but 
few populations, and general information on moments of the range is incomplete. 
This note gives a method for calculating the exact sampling distribution for 
discrete universes having a finite range and approximating those for populations 
with an infinite range. 


2. Derivation. Consider a random variable X defined on integers a to b, 
both finite. Let p; be the probability that X is 7, and p(R) be the probability 
that the range takes the value R. Then for a sample of n X’s from the popula- 
tion (drawn with replacement) we have 


b—R n—1 n—r 


(1) p(R) = = > > __ 81 PE Pine (Diss tT oc + Pitre) 


t=—a r=] =i risi(n—r-—s)! ’ 


since the summand contains at least one X at 7 and at least one X ati + R 
and those X’s not at these values are all between, and the summation is over all 
possible such samples. To obtain a more useful form we let 

i+R 


(2) M(i, R) = > p;. 


j=l 
Then 


b—R n—1 n—r 


1m" 7 
p(R) Bs > 7 o N: Pi Pizr - Mi ‘ 1,R sat 2) 


ima =i om T! 8! (n — Tr — 8) 
b—R 


= ps [terms of M”(i, R) containing at least one 7 and at least one i + R]. 


To get the desired terms of M"(i, R), we first subtract from it all of those terms 
which fail to contain any 7 + R, namely, M"(i, R — 1). Then we also subtract 
off those which fail to contain any 7, namely M"(i + 1, R — 1). But these two 
expressions overlap to the extent of M"(i + 1, R — 2), that is, terms with 
neither i nor i + R. So this must be added back on. Thus we have 


b—R 
(3) p(R) =  [M*(i, R) — M*(i, R — 1) 


—-M(G+1,R -—1) + MG +1,R — 2). 
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To systematize calculation, another form is desirable. Let 


b—R-1 
(4) Ce= 2 M'G,R), 
t=a+ 
(5) E, = M"(a, R) + M"(b — R, R). 
Then we have 
(6) p(R) = Cet Er — 2Cain — Ernit Cre. 


Formulas (3) and (6) are appropriately modified for R = 0, 1, b — a — 1, and 
b— a. 


3. Calculation. In computing the p(R), the universe probabilities can best be 
listed as integer frequencies, as small as possible. Then sums of consecutive 
frequencies, two at a time, three at a time, etc., are formed, the resulting table 
being of the same form as a table of differences. Then the C, and EF; are found 
by forming sums of nth powers of these table entries. The appropriate modifica- 
tions of (6) are made by omitting terms naturally absent from this table. 


4. An Example. Formula (6) enables us to study the effect on ranges of non- 
normality in the population. Thus we may compare the following two distribu- 
tions: One a discrete distribution with probabilities approximately proportional 
to normal curve areas and the other approximately proportional to those of a 
well-skewed Pearson Type III. 





xX 0 1 2 3 4 5 6 7 8 9 10 11 
Bi) secnes ie de 005 .015 .050 .115 .195 .240 .195 .115 .050 .015 .005 .000 
es nen ies 6 hee Ts. Ae pee, ee 4g ee ee, ee, 





The respective characteristics are 
nw = 5.00 o = 1.71 a, = 0 a, = 3.02 
p = 3.45 o = 1.99 a; = .99 a, = 4.21 


The respective distributions of range n = 5 are the following: 





R 0 1 2 3 4 5 6 7 8 9 10 il 
GD) itis, -001 .031 .146 .239 .251 .179 .096 .040 .013 .003 .0005 


a 001 .028 .114 .203 .221 .180 .117 .063 .030 .020 .020 .002 


The characteristics are respectively 
Bae = 3.93 Cr = 1.53 a3 Al a= 3.01 


. Bre = 4.44 or = 1.94 a = .73 a = 3.47 


It can be seen that there is much less difference in skewness in the distributions 
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of R than in the original populations. The R distributions are in fact quite 
similar if allowance is made for the difference in population standard deviations. 
Hence we can have quite a bit of confidence in using normal curve constants 
when making control charts for ranges for moderately skewed populations 
and small sample sizes. 


Cn RR a 


THE STOCHASTIC CONVERGENCE OF A FUNCTION OF SAMPLE 
SUCCESSIVE DIFFERENCES! 


By LioneL WEIss 
University of Virginia 


1. Summary and introduction. Let f(z) be a bounded density function over 
the finite interval [A, B] with at most a finite number of discontinuities. Let 


X,, X2, -+: , X, be independent chance variables each with the density f(x). 
Define Y; S Yo S --- S Y, as the ordered values of X,, X2, --- , X,, and 
T; as Yi4: — Y;. Also define R,,(¢) as the proportion of the variates 7; , --- , Tr 


not greater than ¢ / (n — 1). We shall denote [1 — {4% f(x)e"”™ dz] by S(t), 
and sup;>o |R,(#) — S(t)| by V(n). Then it isshown that as n increases, V(n) 
converges stochastically to zero. The relation of this result to other results is 
discussed. 


2. Proof of the stochastic convergence of V(n) to zero. 

Lemna 1. If for each given t, R,(t) converges stochastically to S(t) as n increases, 
then V(n) converges stochastically to zero. 

Proor. We must show that for any given positive numbers ¢ and 64, there is a 
positive integer N(e, 5) such that if n > N(e, 5), then P[V(n) < «| > 1 — 6. 


We can find a finite set of values & < 4; < --- < ¢, such that 
S(t) < te, 1 — S(t.) < }e, S(tizi:) — S(t) < 4e, 
+=0,1,---,s-1. 


Also, by the hypothesis of the lemma and other familiar considerations, we can 
find a positive integer, say N(e, 5), such that if n > N(e, 4), 


P(|R,(t:) — S(t:)| < 4e for i =0,---,s] >1— 6. 
But then the lemma is proved, for it is easily verified that if |R,(t;) — S(t:)| < 4. 


simultaneously for i = 0, --- , s, then |R,(t) — S(t)| < e€ simultaneously for 
allt = 0. 
Lemma 2. Let X; , --- , X, be independent chance variables each with a uniform 


distribution on [0, 1]. Let M denote the number of these variables falling in the closed 


Received August 6, 1954. 


1 Research under a grant from the Institute for Research in the Social Sciences, Uni- 
versity of Virginia. 
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interval |C, D|, whereO S&S C < D S 1, andlet Y; S Y2 S --- S Yy denote the 
ordered values of the variables in [C, D). Define Wo = Y; — A,and W, = YVisg — Y; 
fori = 1, ---, M — 1. Finally, define L(n, t) as the total number of values of 
Wi, +++, Wa_-1 which are not greater than t / (n — 1) for a givent = 0. Then 
L(n, t) / (n — 1) converges stochastically to (D — C){1 — e ‘| as n increases. 

Proor. We denote (D — C) by G, and by K(n, #) the total number of 
Wo, --: , Was not greater than t / (n — 1). Clearly, the lemma will be proved 
if we can show that K(n, t) / (n — 1) converges stochastically to G(1 — e~‘) 
as n increases. The distribution of M is binomial, with parameters G and n, 
and the joint conditional density of Y:, --- , Yu given M is M!/G” in the 
region C = Y,; S --: S Yu S D, and zero elsewhere. Thus the joint con- 
ditional > rig of Wo, ---, Wu-_s given M is M! /G™ in the region W; = 0 
and >o*a' W; < G. 

Define Z; to be 1 if W; s t/ (n — 1), and zero otherwise. By the symmetry 
of the joint conditional distribution of Wy, ---, Wuw4s1, EK(n, t) = 
E{M-E{[Z, | M]}. The conditional density of Wy given M is M(G — w)“"* / G™ 
for 0 < w S G. Thus E[Z,| M] = 1 — (1 — t/ (n — 1)G)", assuming G = 
t/(n — 1), which involves no loss of generality, since G is fixed and we are 
interested in what happens as n increases. By routine manipulations of the 
moment generating function of M, we find that 


EK(n, t) = E{M[1 — (1 — t/ (n — 1)G@)*J} 
= nG — [1 — t/ (n — 1)]"""[nG@ — nt / (n — 1)), 


From this, we find that E[K(n, t) / (n — 1)] approaches G(1 — e~‘) as n in- 
creases. Next we examine 


p| Koo iy (> ¥ az, 
n 1 n—1 im_0 j=m0 
Guyot) olceezes 


But >> Z; = >> Z,, and from above we have that E[>* Z;/(n — 1)] ap- 
proaches G(1 — e‘) an n increases. Therefore E[K(n, t) / (n — 1)}* has the 
same limit as 


(;-* i) B/D pe z;| = (4, i) E{M(M — 1)-E(ZoZ 


This last equality holds because of the symmetry of the distribution of 
Wo, -*+, Wa+. The joint conditional density of Wo, Wi given M is 
M(M — 1)(G — w — w:)"” / G” for w, wi = 0 and w + w, < G. Thus 











t P 2t q 
Edot|M) =1-2(1- 7+) +(1- aaa) 
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provided G = 2t/ (n — 1), which involves no loss of generality. Therefore 
E{K(n, t) / (n — 1)} has the same limit as 


weg) mor faa — garg) + (1 - waa) | 
rid? 80 Hieete) aaa). 
+(\- gp) -w2y) | 


This last expression approaches G*[1 — 2e' + e“] = [G(1 — e'‘)|’ as n in- 
creases. But this proves Lemma 2, since the variance of K(n, t) / (n — 1) ap- 
proaches zero as n increases. 

With a few simple changes in notation, Lemma 1 serves to show that 
supizo |L(n, t) / (n — 1) — G(1 — e”‘)| converges stochastically to zero as n 
increases. Also, when G = 1, Lemmas 1 and 2 prove that V(n) converges sto- 
chastically to zero for the special case where f(x) = 1 on the interval (0, 1). 

Now we turn to the proof that V(n) converges to zero in the general case. 
We denote f4f(x) dx by F(x). By the assumptions about f(x) listed in Section 1, 
given any positive number y, we can break the interval [A, B] into a finite 
number k(y) of subintervals (ao, ai), +++ , (@ecq-1, Ge), With a = A and 
ai;,, = B, such that in the interior of each subinterval (a; , a;4:), f(x) is con- 
tinuous, and for any z in the subinterval, | f(z) — f(a:)| < y for? = 0,---, 
k(y) — 1. Choose any particular subinterval (a; , a;,,), and letQi5 QS -:- 3S 


Q « denote the ordered values of those variables X,, --- , X, which fall in (a;, a;41), 
while 7; shall denote Q;., — Q; for j = 1,---, M — 1. Denote F(Qj4:) — 
F(Q;) by W; . Then, defining L,(n, #) in terms of W,, --- , Wu, as in Lemma 
2, with the subscript 7 to show that we are dealing with the interval (a; , a;4:), 
we have that sup +20 | Li(n, t)/(n — 1) — {F(aix:) — F(a)}(1 — e‘) | con- 
verges stochastically to zero as n increases. (This is so because F(X;) has the 
rectangular distribution over (0, 1) for any 7.) By construction, we have 


W; = FQixnx) — FQ) = Ti(f(a,) + 45), |9;| < +. 


Therefore, if T; < t/(n — 1), then W; S (f(a;) + y)t/(n — 1), and conversely. 
Then, letting K,(n, ¢) denote the number of the values 7; which are not greater 
than ¢/(n — 1), we nave 


L,(n, t(f(a;) ni ¥)) = K,(n, t) = L(n, t(f(a;) + ¥)). 


Using R,(é) as defined in Section 1, this becomes 
1 *Qe 


Lan, (a) - 1) — Wy) S Bald 
1 k(y)—1 


s—— dX Ln, t(f(a;) + y)) + k(y). 
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Given any positive values e, 5, we first choose y so small that 


| KCy)—1 


D (Fai) — F(a)" e'? — | f(a)e“" da| < he 


t= 


for ¥ equal to either 7 or —y. Then we choose N(e, 5) so large that ifn > N(e, 8), 
then k(y)/(n — 1) < } «, and also 


P{suprso| Li(n, t)/(n — 1) — {Flaizs) — Fla)}(l — e*)| < € /4k(y), 


a7=0,---,ky) -—-1>1-—46. 
But then if n > N(e, 4), 


k(y)—1 


> {Flasss) — Fla)}(1 — &%%e'%) — de S R, (8) 


Pb ye: yak >1-38, 
< LD {Flaus) — Fla)}( — Ye”) + he 
or P{ | R(t) — S(t)| <e«} > 1 — 6. This shows that for any given ¢, R,(t) 
converges stochastically to S(¢). Then by Lemma 1, V(n) converges stochasti- 
cally to zero. 

The same results hold with only slight modifications in the argument when 
A = —« and/or B = o, provided that there exist finite numbers A’ and 
B’, with A’ < B’, such that f(x) is nondecreasing in the interval (— «, A’) 
and:is nonincreasing in the interval (B’, ~). 


3. Relation to other results. The stochastic convergence of certain functions 
of T,, --- , T,-1 can be proved simply by the use of these results. For example, 
Sherman [1] studied the chance variable ©, defined as 


n—l | 


, 1 | | 
4> Te — | t+3/%i- 4/+4/B - Yel. 


te=l 


Let us assume that A is the least upper bound of all numbers a such that F(a) = 
0, and B is the greatest lower bound of all numbers b such that F(b) = 1. Then 
as n increases, | Y; — A | and | B — Y, | converge stochastically to zero. Thus 
2, converges stochastically to a constant if and only if U, = 4 Dia |T: — 
1/(n — 1) | converges stochastically to the same constant. We can write U, as 
S, + V, where 


1 1 
S, = 34 — T;?, V, = } Ti - : 
. pean { — 1 } ° ieee { _ 7 


But omar {T: — 1/(n — 1)} = V, — S, converges stochastically to 
4(B — A — 1). Thus U, converges stochastically to a constant if and only if 
S, converges stochastically to a constant. We can write 


po peal gs n Pilagitenstsisteag a pe 
ge (Aa ty) ata = 4] RAC) | taro) | 
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Integrating by parts, we find that S, = 4f>R,(t) dt. By the result proved in 
Section 2 this last expression converges stochastically to 


if [1 - :. f(x) ow az | dt = i [1 + 7 ot? dx — (B - ay], 


Therefore @,, converges stochastically to $(1 + A — B) + fie’ dz. For the 


special case A = 0 and B = 1, this is essentially the result contained in theorems 
3 and 4 of [1]. 


REFERENCE 


[1] B. Soerman, “‘A random variable related to the spacing of sample values,’’ Ann. Math. 
Stat., Vol. 21 (1950), pp. 339-361. ’ 


Note added in proof. Professor Julius Blum has pointed out that Lemma 2 
holds with the words “‘converges stochastically” replaced by “‘converges with 
probability one.” Then it is easily seen that all the results above hold when this 
replacement is made. 
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(Abstracts of papers presented at the Chapel Hill meeting of the Institute, April 22-23, 1956) 


1. Estimation of Location and Scale Parameters by Order Statistics from 
Singly and Doubly Censored Samples. Part I. The Normal Distribution 
up to Samples of Size 10. A. E. Sarwan and B. G. Greensera, Uni- 
versity of North Carolina. 


The variances and covariances of the order statistics for samples of sizes £20 from a 
normal distribution were calculated to 10 decimal places from Teichroew’s tables of the 
expected value of the product of two order statistics. By the use of these values, and with 
the table of expected values of Rosser, the best linear estimates of the mean and standard 
deviation were calculated from singly and doubly censored samples up to samples of size 10. 
This was accomplished by applying the method of least squares to the linear combination 
of the ordered known observations to obtain unbiased estimates with minimum variance. 
The variances of the estimates were also calculated. An alternative linear estimate was 
derived for larger values of n which can be used to obtain estimates from doubly censored 
samples. 


2. An Application of Chung’s Lemma to the Kiefer-Wolfowitz Stochastic Ap- 
proximation Procedure. Cyrus Drerman, Syracuse University. 


Let M(z) be a strictly increasing regression function for x < @, and strictly decreasing 
regression function for xz > @. Kiefer and Wolfowitz (Ann. Math. Stat., Vol. 23 (1952), 
pp. 462-466) suggested a recursive scheme for estimating @. They proved, under certain 
regularity conditions, that their scheme converges stochastically to @. Their conditions 
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exclude the case M(x) = K — K’ (x — 6)? where K and K’ are constants (K’ > 0). Condi- 
tions, which do not exclude the above case, are given here for their scheme to converge 
stochastically to @. Under stronger conditions (the above case still not excluded) con- 
vergence to the normal distribution is proved. The main tool used in the analysis is a lemma 
due to Chung (Ann. Math. Stat., Vol. 25 (1954), pp. 463-483). 


3. Simplified Estimators Based on Order Statistics. (Preliminary Report.) 
BENJAMIN Epstern, Wayne University. 


Best linear unbiased estimates based on order statistics have been given recently by 
A. E. Sarhan [Ann. Math. Stat., Vol. 25 (1954), pp. 317-328] for the mean and standard 
deviation of a number of distributions. It is assumed in that paper that all observations 
are known. In a paper given at the Berkeley meeting in December, 1954, Sarhan considered 
the same estimation problem in the case where some of the ordered observations may be 
missing. Here we give unbiased estimators which are much simpler in the sense that they 
can be expressed in terms of only a few of the order statistics about which we have informa- 
tion. Efficiency of the suggested estimators is high for small sample sizes. 


4. Distribution of the Difference Between the Two Largest Sample Values. 


(Preliminary Report.) A. ZincerR and J. St-Pierre, University of Mon- 
treal. 


A decision procedure to select the population with the largest mean, proposed by R. C. 
Bose and J. St.-Pierre (Ann. Math. Stat., Vol. 25 (1954), p. 813), involves the auxiliary 
statistic y = 20) — Za) , Where Z@) and 2,1) are respectively the largest and second largest 
values in a sample of n + 1 variates. The distribution of this statistic is obtained with a 
method simpler than the one already used by the senior author. The final result involves 
iterated integrals of the normal density over simple limits. The general form can be easily 
reduced to neat expressions in the case of lower dimensions. The densities of 3, 4, and 5 
dimensions have been extensively tabulated and a recursion formula established between 
the densities. The establishment of a recursion formula in the general case is being 
worked on. 


5. Some Continuous Monte Carlo Methods for the Dirichlet Problem. Mervin 
E. Mutuer, Cornell University. 


Monte Carlo techniques are introduced using stochastic models which are Markov proc- 
esses. This material includes the N-dimensional spherical, general spherical, and general 
Dirichlet domain processes. These processes are proved to converge with probabily 1 and 
thus yield direct statistical estimates of the solution to the N-dimensional Dirichlet prob- 
lem. The results are obtained without requiring any further restrictions on the boundary 
or the function defined on the boundary in addition to those required for the existence and 
uniqueness of the solution to the Dirichlet problem. A detailed study is made for the N-di- 
mensional spherical process. This includes a study of the order of the average number of 
steps required for convergence. Asymptotic confidence intervals are obtained. When 
computing effort is measured in terms of the order of the average number of steps required 
for convergence, the often made conjecture that the computing effort of a Monte Carlo 
procedure should be a linear function of the dimensionality of the problem is shown to be 
true for the cases considered. Comments are included regarding the application of these 
processes on digital computers. Truncation methods are suggested. 
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6. On the Distribution of the Number of Successes in Independent Trials. 
WassiLy HoerrpinG, University of North Carolina. 


Let S be the number of successes in n independent trials. Let p; be the probability of 
success in the ith trial. The problem is considered of finding the maximum or the minimum 
of the expected value of a function of S when E(S) = np is fixed, 0 < p < 1. It is well 
known that the variance of S attains its maximum when the p; are all equal. It is shown: 
(i) for any two integers b, c such that 0 S b S np S c S n the probability P(b < S <S c) 
attains its minimum if and only if all the p; are equal, unless b = 0, c = n; (ii) for any 
strictly convex function g the expected value Eg(S) attains its maximum if and only if 
all the p; are equal. The maximum and the minimum of P(S S c),0 S ¢ S n, are deter- 
mined. These results are obtained with the aid of some theorems concerning the extrema 
of Eg(S), where g is an arbitrary function. For example, the maximum and the minimum: 
of Eg(S) are attained at points (p: , --- , pn) whose coordinates take on at most three 
different values, only one of which is distinct from 0 and 1. Statistical applications of (i) 
and (ii) are pointed out. 


7. On the Solution of Truncated and Censored Sample Estimating Equations 
for Normal Populations. A. C. Conen, Jr., University of Georgia. 


To obtain maximum likelihood estimates of the mean and standard deviation of a nor- 
mally distributed population from doubly truncated and from doubly censored samples, 
it is necessary to carry out the simultaneous solution of a pair of rather complicated non- 
linear estimating equations. In this paper, iterative techniques for solving these equations 
are examined, and a procedure is developed which yields solutions of specified accuracy 
with less computational effort than required by other methods previously employed. A 
chart has been prepared which, for doubly truncated samples, permits a quick graphic 
solution to a degree of accuracy that is adequate for many purposes, and which provides 
a good first approximation for subsequent improvement through iteration when greater 
accuracy is demanded. A chart has also been devised to permit a quick graphic solution 
in the case of singly censored samples. 


8. The Modified Mean Square Successive Difference and Related Statistics. 
Seymour Getsser, University of North Carolina. 


In estimating the variance of a normal population, one uses the sample variance because 
of its optimum properties. In certain cases where there is an indeterminable trend in the 
data, it has been thought useful to estimate the variance by another statistic, namely, the 
mean square successive difference, the mean of the squared first differences, which under 
certain conditions, eliminates a good deal of the trend and is less biased than the sample 
variance. An explicit form of the exact distribution of this statistic seems, at least for the 
present, too difficult to obtain. However, by applying the device of Durbin and Watson, 
that is, by dropping from the mean square successive difference the middle term for an 
even number of observations and the two middle terms for the odd case, it is found that 
the quadratic form has double roots, thus making it possible to obtain the exact distribu- 
tion in terms of elementary functions. In addition, one defines analogues of the Student t 
and the Fisher F using similarly modified statistics and proceeds to derive their exact 
distributions when the observations are independent and in a specific dependency case 
which has several properties in common with the stationary Markov process. 
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9. The Distribution of the Ratios of Certain Quadratic Forms in Time Series. 
(By Title.) Seymour Getsser, University of North Carolina. 


In testing the hypothesis that successive members of a series of observations are serially 
correlated, a number of statistics have been proposed by various statisticians. R. L. Ander- 
son gave the first exact distribution of a serial correlation coefficient using a circular defini- 
tion. J. Durbin and G. Watson gave the exact distributions of several other statistics using 
double root methods. In this paper the work of Durbin and Watson has been extended for 
a non-null case of one of their statistics. Also, by introducing a new model, the exact dis- 
tribution of a modified form of the von Neumann ratio has been derived in the non-null 


case. It has also been shown that this ratio provides a “‘best’’ test for the parameter in- 
volved. 


10. The “Inefficiency” of the Sample Median for many Familiar Symmetric 
Distributions. J. T. Cau, University of North Carolina. 


If the pdf of a certain distribution is symmetric and has an absolute maximum at the 
point of symmetry, a lower bound for var #, the variance of the sample median # of a sample 
of size 2n + 1 is (2n + 1)/(2n + 3) multiplied by the variance of the asymptotic distribu- 
tion of # (which is normal). Therefore if sample size is not too small, the asymptotic vari- 
ance of ~ is for all practical purposes a lower bound for var @. If is asymptotically less 
efficient than Z, it is probable that Z is less efficient than # for most finite samples as well. 
For many symmetric distributions familiar to statisticians, such as triangular, Student’s 
t, symmetric 8, and Cauchy type distributions (f(z) = C./(1 + |z|*), -0o < 2 < a, 
a = 4.65), not counting normal and rectangular distributions, it is shown that Z is for most 
sample sizes less efficient than Z. 


11. On Some Stochastic Models of Behavioral Interaction of Organization 


Theory. Davip RosensBiattr, American University. 


This paper treats certain stochastic models of behavioral interaction which constitute 
applications of a general approach to a calculus of behavior. Participants or groups in 
organizations are viewed as entities provided with a stochastic preference (or threshold) 
apparatus; entities engage in adaptive or reactive behavior by adjustment of the stochastic 
processes governing, their own activities. Transition probabilities are modified in accord 
with experience of ‘‘relative success” of the entities in accord with certain criteria, e.g., 
organizational norms or observed actions of other entities. Modes of behavior are sequen- 
tially reinforced or inhibited as a result of the moves of entities in the course of interaction. 
Various types of memory structure are explicitly introduced. The ‘performance char- 
acteristics’ of a given structure of interaction may be summarized by the expected in- 
dividual and joint probability distributions of behavioral activity of each entity at each 
interaction transaction 7, (k = 1, 2,--- ). Algorithms are developed for the determina- 
tion of these ‘performance characteristics.’’ For certain parametric characterizations 
(r entities, n; decision alternatives and m; preference valuation alternatives for the jth 
entity, 7 = 1, 2,--- , r), the algorithms lead to closed-form expressions. In the simplest 
cases, these become systems of linear difference equations. Many of the asymptotic results 
of stochastic learning theory may be readily obtained by specialization of the present 
models. (Work supported by the Office of Naval Research.) 


12. On Inverting a Class of Patterned Matrices, Part I. S. N. Roy and A. E 
SARHAN, University of North Carolina. 


In this note, inverses are given of a class of patterned matrices that occur in different 
sectors of statistics, e.g., least squares solutions relating to problems of estimation of 
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population parameters by ordered or unordered observations, analysis of variance and co- 
variance, response surfaces, etc. The actual examples given here are illustrative and will 
be followed up later by other examples. In the technique given here of obtaining such 
inverses, use is made of the fact that (i) a non-singular square matrix has a unique inverse 
and (ii) for the class of patterned matrices considered it is possible to guess a form for the 
inverse with a few unknown (and thus flexible) parameters which could then be determined 
by equating to the identity matrix the product of the original matrix and the inverse that 
is guessed. At the moment the guess is just intuitive, but the authors believe there is a 
deeper calculus behind the whole thing, which may emerge later and thus make the in- 
version of such matrices an entirely trivial problem. 


13. Convergence Properties of a General Stochastic Approximation Process. 
(Preliminary Report.) DonaLp BuRKHOLDER, University of North Caro- 
lina. 


TuHeoreo. Let {R,} be a sequence of Borel measurable functions, 0, ¢*, c, d, x, real numbers, 
Q a function from the positive numbers to the natural numbers, and |a,} a positive number 
sequence such that: (i) for each natural number n and each real number z there is a random 
variable Z,(x) such that EZ,(x) = R,(x), Var [Z,(xz)] S o*, and |R,(x)| S c+ d |z| ; (ii) 
if0<e«< « and0 < 3 < & < ~, then (x — 0) Ralz) > 0 for |z — 0) > n > QE), 
and Xa, linfs,<|2~-0\ <3, |R»(z)|] = ©; (iii) La’, < ~. Then the sequence {x,} of random vari- 
ables defined recursively by 2n4: = Zn — GnZn(Ln) converges to 60 with probability one. The 
proof involves methods similar to those used by Blum, Ann. Math. Stat., Vol. 25 (1954), 
pp. 382-386. Some immediate corollaries to the theorem are: (1) The Robbins-Munro process 
converges with probability one (Blum’s Theorem 1). (2) The Kiefer-Wolfowitz process 
converges with probability one (under conditions less restrictive than those heretofore 
published; for instance, a regression function M where M(z) = e~** or M(x) = —z* is per- 
missible). (3) There exists a strongly consistent sequence of estimates of the mode of a 
density function under fairly general conditions. (4) There exists a strongly consistent 
sequence of estimates of a root of a regression equation even when the variances around 
the regression line may not exist. The theorem, and hence also each of the corollaries (1), 
(2), and (3), has been generalized to the case where the number @ does not exist uniquely. 
This permits, for instance, the use of the Robbins-Munro process in the problem of esti- 
mating a quantile of a distribution function when the quantile is not unique. 


14. Distribution of Rounding Off Errors in Some Numerical Processes, Part I. 
A. E. Sarwan, University of North Carolina. 


The distributions of rounding-off errors in a product, quotient, raising to a power process, 
several combinations of these processes, and other special cases are derived. The moment 
generating functions and the first four moments are calculated. Expressions for the sig- 
nificance points at a given level are provided. 


15, On a measure of the Information Provided by an Experiment. (Preliminary 
Report.) D. V. Linptey, University of Cambridge and University of 
Chicago. 


An experiment consists in the observation of a random variable zx with probability den- 
sity f(z | 6), where @ is an unknown parameter. Let p(@) be the probability density of 0, 
expressing the knowledge of @ prior to performing the experiment. Then the average amount 
of information provided by the experiment is defined to be Sff@ | 6)p(@) log {f(x | 6)/p(x)} 
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-dz d@, where p(x) = Sf(x | 6)p(@) dé. This definition is suggested by the corresponding 
definition of Shannon’s in connection with the rate of transmission of information in com- 
munication engineering. It is shown that it is always nonnegative, is not reduced by con- 
sideration of sufficient statistics alone, and if z and y are independent random variables 
for each @, then the experiment in which y is observed is more informative if carried out 
before z is observed than if carried out after z has been observed. The definition enables 
comparisons to be made of different experiments but these comparisons are unlike those 
considered by Blackwell in that the losses in pursuing various courses of action are not 
considered. The ideas are therefore more relevant to the inference problem than the de- 
cision problem. Examples of the use of the definition, in particular for multivariate normal 
densities, are considered. 


16. A Comparison Between Alternative Techniques Using Supplementary In- 


formation in Sample Survey Design. (Preliminary Report.) En Maupy 
Sarp, North Carolina State College. 


Three alternative methods of incorporating the advance information available on a 
variable X in the design of finite population sample surveys to estimate aggregate or mean 
values for a variable Y are studied. For a given sample size n and ignoring cost, the systems 
are (a) stratification with s = n/2 strata, (b) sampling without replacement with unequal 
selection probabilities such that the probability of including units u;u; together in the 
sample is P(uiu;) = n(n — 1)X:Xj(1/T: — Xi + 1/T. — X;)/2Ts where Te = Dona X; 
and the estimator used is 97, = ha yi/P(u;), (c) ratio estimate. Formulas for the mean 
square errors of the estimators are derived for both linear and curvilinear relationships 
between X and Y. Exact comparison for n = 2, using discrete counterparts of some Pearson 
type III distributions for X, showed that (b) is superior to (ce) except for p., very close 
to 1. Approximate comparisons were obtained for n > 2 assuming large N and continuous 
type III distributions. Variance with stratification was approximated by using the uniform 
distribution for X within the first (s — 1) strata. Method (a) was found to be superior to 
(b) and (c) except when c, (the coefficient of variation of Y) is in the neighbourhood of ¢, . 
In certain instances the issue depends on p,, alone; in others, on combination of p,, and c, . 


17. The Canonical Distribution of the Non-central Rectangular Co-ordinates. 
Miss ALEYAMMA GEorRGE, University of North Carolina and University 
of Travancore. 


This paper is concerned with a matrix method of (a) deriving the canonical distribution 
of the non-central rectangular coordinates directly from the probability law for random 
samples from a p-variate normal population for the cases (i) one non-zero root and (ii) 
two non-zero roots for general p and (b) using this to obtain the canonical non-central 
Wishart distribution obtained by T. W. Anderson and M. A. Girshick for the same cases. 


18. Confidence Interval Estimation for the Parameters of a Rectangular and 
an Exponential Population in Terms of Complete or Censored Samples. 
(Preliminary Report.) (By Title.) S. N. Roy and A. E. Sarwan, Uni- 
versity of North Carolina. 


In previous papers by the second author point estimation in the above situations was 
discussed. In the present paper, using the techniques given in previous papers by the first 
author, confidence intervals are given for parameters and certain statistically important 
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parametric functions in the case of rectangular and exponential populations (both general 
and special forms) in terms of complete samples. A method is also indicated of generalizing 
to the case of censored samples and to certain other populations as well. 


19. Some Generalizations of Analysis of Variance and Covariance to the Case 


of Discrete Variates or of Grouping in Qualitative Categories. (By Title.) 
S. N. Roy and Marvin KasTenBAuM. 


Associated with any design there is a general cell—say an m-dimensional cell—in which 
we have a number of observations classified into, say, p-dimensional cells where p is the 
number of ‘‘variates’’ or “number of ways of classification.’? The whole data can now be 
regarded as being arranged in an (m + p)-way classification such that the m-cells and p-cells 
are, as it were, two different kinds of marginals. Bearing in mind the nature of this differ- 
ence, the usual estimation and testing procedures in analysis of variance and covariance 
are generalized to the situations indicated above. The generalization of multivariate 
analysis of variance of means to the above situations will be discussed in a later paper. 


20. Some Analytic Properties of Markoff Functions: Denumerable Case. (By 
Title.) Donatp G. Austin, Syracuse University. 


Let pij(t),O < t< »,%,j7 = 1, 2,--- , be the stationary transition matrix of a Markov 
chain. The author extends his earlier result (to appear in Proc. Nat. Acad. Sci.) that 
Dpii(0) = —qi > — © implies p;;(t) has a continuous derivative on [0, ©]. It is shown that 
if gq; < , pi;(t) has a continuous derivative on [0, ©]. In either case lim, Dp;;(t) 
exists and is equal to0. If g; < ©, then Dpi;(t +s) = Le Dpix(t)pe;(s); if gi ,qi3 < ©, then 
Dpij(t + 8) = Le piz(t) Dpe;(t) for t, s > 0. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary ‘of the Institute news items of interest 
Personal Items 


Thomas L. Austin, Jr., formerly Research Assistant at the University of 
Georgia, has accepted a position as Mathematician with the Dept. of Defense 
in Washington. 

Professor Francisco Azorin P. of the University of Madrid, Spain, spent the 
academic year 1954-55 at the Universidad Central de Venezuela under 
UNESCO’s Technical Assistance Program. 

Maurice H. Belz has been appointed as Professor of Statistics in the Uni- 
versity of Melbourne. 

Francesco Bignardi was designated Statistical Fellow at the University of 
Palermo (Italy) where he is charged with the teaching of Social Statistics in the 
Statistical School. He is also chief of the Economical and remap Service of 
the Banco di Sicilia’s Presidence. 

Arthur B. Brown has been promoted to Professor of Mathematics at Queens 
College, Flushing, N. Y. 

Dr. E. J. Gumbel was Visiting Professor for Mathematical Statistics at the 
Free University of Berlin (West) for the Summer Term 1955. 

Harry M. Hughes, formerly Assistant Professor, University of California, 
Berkeley, is now Analytical Statistician, Dept. of Biometrics, USAF School of 
Aviation Medicine, Randolph Field, Texas. 

Dr. H. Paul Kelley is now on active duty as a Research Psychologist at the 
Naval School of Aviation Medicine, U. 8. Naval Air Station, Pensacola, Florida. 
He was formerly an Educational Testing Service Psychometric Fellow and more 
recently was with the U.S. Air Force Personnel and Training Research Center 
at Lackland Air Force Base, San Antonio, Texas. 

Wharton F. Keppler has recently transferred from his position as Statistician, 
U.S. Naval Ordnance Test Station, Inyokern, California, to Elgin Air Force 
Base, Florida, where he is an Operations Analyst, Office of Operations Analysis, 
Dept. of Chief of Staff/Operations (DCS/O), Hq. Building, Air Proving Ground 
Command. 

Robert J. Nichol is Statistician, Quality Control Group, Planning Dept., 
R.C.A. Service Co., Inc., Missile Test Project, Patrick Air Force Base, Florida. 

Gottfried E. Noether of Boston University has been promoted to Associate 
Professor. 

Don C. Price is now employed in the Engineering Department of the Good- 
year Aircraft Corporation, Akron, Ohio. 

Ronald Pyke having received his M.Sc. degree in Mathematical Statistics is 
continuing at the University of Washington as Research Assistant while work- 
ing toward a Doctorate. 
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Robert L. Rogers recently resigned from his statistical and accounting duties 
with Stokely-Van Camp, Inc. in order to accept a position as Mathematician 
in the Computing Bureau with International Business Machines, Inc., Los 
Angeles, California. . 

C. H. Springer has recently accepted a position with the Aircraft Gas Turbine 
Development Department, General Electric Company, Cincinnati, Ohio. As 
Component Testing Evaluation Engineer, he will be engaged in the application 
of statistical principles to the design and evaluation of Jet Engine Research 
Testing Operations. 

Jerome R. Steen has become Manager of Quality Control for the Radio and 
Television Division of Sylvania Electric Products Inc. He is at present located 
at Batavia, N. Y. 

Dr. Joseph V. Talacko, Assistant Professor of Mathematics, Marquette Uni- 
versity, Milwaukee, Wisconsin, returned in February from Berkeley to Mil- 
waukee. He plans to spend the second half of his 1954/55 Ford Foundation 
Fellowship at the University of Chicago. 

Dr. G. 8. Watson has resigned from the Department of Statistics, University 
of Melbourne, to take up a new position as Senior Fellow, Dept. of Statistics, 
The Australian National University, Canberra. 

William Wolman, formerly with Naval Inspector of Ordnance, Eastman Kodak 
Co., Rochester, N. Y., is now head of the Statistical Methodology and Reli- 
ability Section, Statistics Branch, Quality Control Division, Bureau of Ordnance, 
Navy Department in Washington, D. C. 


James K. Yarnold, formerly a graduate student and Research Assistant at the 
Statistical Laboratory, University of California, Berkeley, is now a graduate 
student in Mathematical Statistics at the University of Illinois and a Research 
Assistant in the University of Illinois Training Research Laboratory. 

Mr. H. Zindler is now Referent fiir Mathematische Statistik in Abteilung VIII 
des Statistischen Bundesamtes, Wiesbaden-Biebrich, Rheinstr. 25, Germany. 


$e 


Educational Testing Service 


The Educational Testing Service is offering for 1956-57 its ninth series of re- 
search fellowships in psychometrics leading to the Ph.D. degree at Princeton 
University. Open to men who are acceptable to the Graduate School of the Uni- 
versity, the two fellowships each carry a stipend of $2,500 a year and are nor- 
mally renewable. Fellows will be engaged in part-time research in the general 
area of psychological measurement at the offices of the Educational Testing 
Service and will, in addition, carry a normal program of studies in the Graduate 
School. 

Suitable undergraduate preparation may consist either of a major in psychol- 
ogy with supporting work in mathematics, or a major in mathematics together 
with some work in psychology. However, in choosing fellows, primary emphasis 
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is given to superior scholastic attainment and demonstrated research ability 
rather than to specific course preparation. The closing date for completing appli- 
cations is January 12, 1956. Information and application blanks will be available 
about October 1 and may be obtained from: Director of Psychometric Fellow- 


ship Program, Educational Testing Service, 20 Nassau Street, Princeton, New 
Jersey. 


New Members 


The following persons have been elected to membership in the Institute 


February 9, 1955 to May 11, 1955 


Baumann, Carl O., B.A. (American International College), Assistant Statistician, Mon- 
santo Chemical Company, Indian Orchard, Massachusetts, 221 Britton Street, Fairview 
Massachusetts. 

Beckwith, Richard E., B.S. (Stanford Univ.), Research Assistant, Case Institute of Tech- 
nology, Cleveland 6, Ohio. 

Birch, John J., B.S., (Brown Univ.), Graduate Student in Statistics, University of Cali- 
fornia, Berkeley, California, 540 Alcatraz Avenue, Oakland 9, California. 

Block, Aaron, B.A. (Brooklyn College), Analyst, City Planning Department, Office of 
Master Planning, 15 Park Row, New York, New York, 102-35 64th Road, Forest Hills 76, 
New York. 

Blum, Joseph, M.A. (George Washington Univ.), Machine Methods Analyst, Assistant 
Branch Chief of Spec. Processing Branch, National Security Agency, Washington 25, 
D. C., 4314 N. Pershing Drive, Arlington 3, Virginia. 

Boyd, Evelyn, Ph.D. (Yale Univ.), Mathematician, Department of the Army, Diamond 
Ordnance Fuse Laboratory, Washington 25, D. C., 1353 Ritchie Place, N. E., Washington 
17, D.C. 

Breakwell, John Valentine, Ph.D. (Harvard Univ.), Senior Research Engineer, North 
American Aviation, Inc., Downey, California, 21 Temple Avenue, Long Beach 8, Cali- 
fornia. 

DeMarr, Ralph A., M.A. (Washington State College), Member of Technical Staff, Bell 
Telephone Laboratory, Whippany, New Jersey, 29 DeBary Place, Summit, New Jersey 

Dubay, Joseph A., M.A. (Harvard Univ.), Graduate Student, Committee on Statistics, 
University of Chicago, Chicago 37, Illinois, 22 Snell Hall, 5709 South Ellis Avenue, Chicago 
37, Illinois. 

Friedman, Henry D., Ph.D. (Penn. State Univ.), Mathematician, General Electric Co., 
Electronics Park, Syracuse, New York, Room 235, Bldg. 3. 

Guthrie, Donald Jr., B.Sc. (Stanford Univ.), Student and Department Teaching Assistant, 
Dept. of Mathematical Statistics, Columbia University, New York 27, New York. 

Heit, Paul B., B.S. (C.C.N.Y.), Graduate Student, Columbia University, New York 27, 
New York, 1540 Walton Avenue, Bronx 52, New York. 

Hopkins, John W., Ph.D. (Univ. of London), Biometrician, Division of Applied Biology, 
National Research Council, Ottawa 2, Canada. 

Inman, Patricia, B.A. (Reed College), Student and Research Assistant, Department of 
Mathematics, University of Oregon, Eugene, Oregon. 

Kastenbaum, Marvin A., M.S. (North Carolina State College), Graduate Assistant, De- 


partment of Biostatistics, School of Public Health, University of North Carolina, Chapel 
Hill, North Carolina. 
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Kendall, G. R., M.A. (Univ. of Toronto), Meteorologist and Climatologist, Meteorological 
Division, Department of Transport, 315 Bloor Street W., Toronto, Ontario, Canada, 
2099 Stovebank Road, R.R. #1, Port Credit, Ontario, Canada. 

Lanahan, James F., M.S. (Univ. of Michigan), Instructor of Mathematics, University of 
Detroit, Detroit 21, Michigan. 

Madansky, Albert, A.B. (Univ. of Chicago), Student in Committee on Statistics, University 
of Chicago, Eckhart Hall, Chicago, Illinois, 1516 S. Kostner, Chicago 23, Illinois. 

Oneson, Thomas M., M.A. (Univ. of Illinois), Statistician, Owens-Corning Fiberglas Corp., 
Case Avenue, Newark, Ohio. 

Paul, William H., B.S. (Penn. State College), Project Supervisor, Aircraft Instrumentation 
Display Studies, Stavid Engineering, Inc., 312 Park Avenue, Plainfield, New Jersey. 
Perrin, Edward B., B.A. (Middlebury College), Graduate Student, Department of Mathe- 

matical Statistics, Columbia University, 179 S. Main Street, Barre, Vermont. 

Rothman, Stanley, M.A. (Columbia Univ.) Staff Member, RAMO Wooldridge Co., 8820 
Bellanca Avenue, Los Angeles, California. 

Sax, Edward, B.A. (Wayne Univ.), Analytical Statistician, 1121 New Hampshire Avenue, 
#4083, Washington 6, D.C. 

Seibel, Melvin H., B.S. (Univ. of Illinois), Physicist, Statistical Section, Engineering and 
Technical Department, Army Electronic Proving Ground, Fort Huachuca, Arizona. 
Stevens, Kerry N., B.A. (Central Wash. College), Graduate Student, University of Wash- 

ington, Seattle 5, Washington, 120 East 110th, Seattle 55, Washington. 

Stuart, Alan, B.Sc. (Univ. of London), Senior Research Officer and Lecturer, Division of 
Research Techniques, London School of Economics, Houghton Street, Aldwych, London 
W.C. 2, England. 

Suurballe, John W., M.Sc. (State Univ. of Iowa), Applied Mathematician, Farnsworth 
Electronics Co., Fort Wayne, Indiana, 2407 North Anthony, Fort Wayne, Indiana. 

Thampuran, D. V., M.S. (Univ. of Travancore), Research Assistant, Central Marine Fish- 
eries Research Station, Madapam Camp., P.O., 8. Rly, India. 

Wartmann, Rolf, Dr. (Technische Hochschule Darmstadt), Sachbearbeiter fiir Technische 
Statistik, Verein Deutscher Eisenhiittenleut, Diisseldorf, Breite Str. 27, Germany. 

Weibull, Martin, Fil. Dr. (The Royal Univ. of Lund), Docent, University of Lund, Lund, 
Sweden, Department of Statistics, Solvegatan 5, Lund, Sweden, Revingeg 17, Lund, 
Sweden. 

Wendel, J. G., Ph.D., (California Inst. of Tech.), Associate Professor of Mathematics, 
Louisana State University, Baton Rouge, Louisana. 

Wetzel, Wolfgang, Dr. rer Pol. (Freie Univ. Berlin), Wissenschaftlicher Assistant, Seminar 
fiir Statistik der Freie Universitat Berlin, Berlin-Dahlem, Bachstelzenweg 29/31, Berner 
Str. 19, Berlin—Lichterfelde West. 





REPORT OF THE CHAPEL HILL MEETING OF THE INSTITUTE 


The 1955 Eastern Regional Meeting, sixty-fifth meeting of the Institute of 
Mathematical Statistics, was held in Chapel Hill, North Carolina, April 22-23, 
1955. A meeting of the Biometric Society (Eastern North American Region) 
was held in Chapel Hill at the same time. 

The following 80 members of the Institute registered for the meeting: 


R. L. Anderson, T. W. Anderson, Helen Bozivich, R. H. Brunelle, Benjamin Buchbinder, 
D. L. Burkholder, J. M. Cameron, R. L. Carter, J.T. Chu, W. H. Clatworthy, A. C. Cohen, 
Jr., Theodore Colton, W. C. Connor, L. M. Court, E. L. Cox, Gertrude Cox, P. P. Crump, 
Claude de Courval, Cyrus Derman, Alfred Descloux, Elizabeth Doan, D. B. Duncan, 
Churchill Eisenhart, Lillian Elveback, Benjamin Epstein, 8. M. Free, J. E. Freund, Sey- 
mour Geisser, H. 8. Graf, B. G. Greenberg, 8. W. Greenhouse, F. E. Grubbs, 8. 8. Gupta, 
R. J. Hader, Max Halperin, Boyd Harshbarger, Wassily Hoeffding, Jacob Horowitz, D. G. 
Horvitz, Harold Hotelling, W. G. Howe, J. 8. Hunter, D. C. Hurst, Mohammad Iqbal, K. 
Ito, A. W. Kimball, Julius Lieblein, D. V. Lindley, Eugene Lukacs, J. H. MacKay, F. 8. 
McFeely, H. A. Meyer, D. F. Morrison, M. E. Muller, V. N. Murty, C. R. Newell, G. E. 
Nicholson, Jr., G. E. Noether, Wyman Richardson, D. L. Richter, David Rosenblatt, Joan 
R. Rosenblatt, A. E. Sarhan, Roberto Sasso, F. E. Satterthwaite, M. A. Schneiderman, 
H. Smith, W. L. Smith, G. W. Snedecor, P. N. Somerville, Jacques St-Pierre, H. C. Sweeny, 
Z. Szatrowski, W. A. Thompson, Jr., M. E. Turner, Jr., M. C. K. Tweedie, Lionel Weiss, 
J. W. Wilkinson, R. L. Wine, Marvin Zelen. 


The program was as follows: 


FRIDAY, APRIL 22, 1955 


8:30 a.m. Joint Session with Biometric Society. 
Chairman: H. Farrrie.p Smits, North Carolina State College. 


Papers: 1. Life Testing in the Discrete Case, FRaNnkLIN S. McFrety and Joun E. 
FReEvunD, Virginia Polytechnic Institute. 
2. The Components of Variance and the Correlation Between Relatives in Sym- 
metrical Random Mating Populations, Tep Horner, Iowa State College. 
3. Tests of Hypotheses when the Decision is Based on Several Criteria (Prelimi- 
nary Report), Inw1N MILLER and Joun E. Frevunp, Virginia Polytechnic 
Institute. 
4. Power Function of Procedures for Some Components of Variance Models, 
HELEN Boztvicu, Iowa State College. 
5. Preference Patterns for Decisions on Means, R. LowEeuu Wine and Joun E. 
FREUND, Virginia Polytechnic Institute. 
(Papers 1, 3, and 5 were on work sponsored by the Office of Ordnance Research, U.S. Army.) 


10:30 a.m. Problems of Probability. 
Chairman: EvcGene Luxacs, Office of Naval Research. 


Papers: 1. Differentiation of Markov Transition Functions, Donatp G. Austin, Syra- 
cuse University. 
2. An Extension of the Kolmogorov Limit Theorem, Jerry BLACKMAN, Syracuse 
University. 
3. Non-recurrent Random Walks, Cyrus Derman, Syracuse University. 
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2:00 p.m. Multivariate Analysis. 
Chairman: Harouip Hore.uine, University of North Carolina. 


Papers: 1. Principal Components and Factor Analysis, T. W. ANDERSON, Columbia 
University. 
2. Some Contributions to Factor Analysis, WitL1am G. Howe, Oak Ridge Na- 
tional Laboratory and University of North Carolina. 
3. Analysis of Variance of Correlated Variates with Heterogeneous Variances, 
H. C. Sweeney, Virginia Polytechnic Institute. 


4:00 p.m. Contributed Papers I. 
Chairman: George E. Nicuouson, Jr., University of North Carolina. 


Papers: 1. Estimation of Location and Scale Parameters by Order Statistics from Singly 
and Doubly Censored Samples. Part I. The Normal Distribution up to 
Samples of Size 10, A. E. SarHan and B. G. GREENBERG, University of 
North Carolina. 

2. An Application of Chung’s Lemma to the Kiefer-Wolfowitz Stochastic Ap- 
proximation Procedure, Cyrus DeRMAN, Syracuse University. 
3. Simplified Estimators Based on Order Statistics (Preliminary Report), Ben- 
JAMIN Epster1n, Wayne University. 
. Distribution of the Difference Between the Two Largest Sample Values (Pre- 
liminary Report), A. ZinceR and J. St-Prerre, University of Montreal. 
. Some Continuous Monte Carlo Methods for the Dirichlet Problem, Mervin 
E. Mutuer, Cornell University. 
. On the Distribution of the Number of Successes in Independent Trials, Was- 
stty Horrrp1nG, University of North Carolina. 
. On the Solution of Truncated and Censored Sample Estimating Equations for 
Normal Populations, A. C. Conen, Jr., University of Georgia. 
8. The Modified Mean Square Successive Difference and Related Siatistics, 
Seymour Getsser, University of North Carolina. 
9. The Distribution of the Ratios of Certain Quadratic Forms in Time Series 
(By Title), Seymour Getsser, University of North Carolina. 


SATURDAY, APRIL 23, 1955 
8:30 a.m. Symposium on Relation Between Smoking and Mortality from Lung 
Cancer. (Co-sponsored by Biometric Society.) 
Chairman: B.G. Greenberg, University of North Carolina. 


Papers: 1. Current Status of the Problem, Jerome CorNFIELD, National Institute of 
Health. 


2. Needed Future Work, Witt1am HagEnszeE., National Cancer Institute. 


Discussants: Boyp HarsHBarGER, Virginia Polytechnic Institute, DanreL Horn, Ameri- 
can Cancer Society. 


2:00 p.m. Contributed Papers II. 


Chairman: Lionev Weiss, University of Virginia. 


Papers: 1. The ‘‘Inefficiency”’ of the Sample Median for Many Familiar Symmetric 
Distributions, J. T. Cav, University of North Carolina. 
2. On Some Stochastic Models of Behavioral Interaction of Organization Theory, 
Davip RosenBuatt, American University. 
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3. On Inverting a Class of Patterned Matrices, Part I, 8. N. Roy and A. E. 
SaRHAN, University of North Carolina. 

. Convergence Properties of a General Stochastic Approximation Process 
(Preliminary Report), DonaLp BuRKHOLDER, University of North 
Carolina. 

. Distribution of Rounding Off Errors in Some Numerical Processes, Part I, 
A. E. Sarwan, University of North Carolina. 

. On a Measure of the Information Provided by an Experiment (Preliminary 
Report), D. V. LinpLey, University of Cambridge and University of 
Chicago. 

. A Comparison Between Alternative Techniques Using Supplementary In- 
formation in Sample Survey Design (Preliminary Report), Eu Manpy 
Sap, North Carolina State College (introduced by R. L. Anderson). 

3. The Canonical Distribution of the Non-central Rectangular Co-ordinates, 
Miss ALEYAMMA GEORGE, University of North Carolina and University 
of Travancore (introduced by H. Hotelling). 

. Confidence Interval Estimation for the Parameters of a Rectangular and an 
Exponential Population in Terms of Complete or Censored Samples (Pre- 
liminary Report) (By Title), S. N. Roy and A. E. Sarwan, University 
of North Carolina. 

. Some Generalizations of Analysis of Variance and Covariance to the Case of 
Discrete Variates or of Grouping in Qualitative Categories (By Title), 
S. N. Roy and Marvin KastTensBaum, University of North Carolina. 

. Some Analytic Properties of Markoff Functions: Denumerable Case (By 
Title), Donato G. Austin, Syracuse University (introduced by C. 
Derman). 


LIONEL WEISS 
Associate Secretary 


rn 


PUBLICATIONS RECEIVED 


Tables of Sines and Cosines for Radian Arguments, National Bureau of Standards, Applied 
Mathematics Series 43, U.S. Government Printing Office, Washington, D. C., 1955, 
278 pp., $3.00. 














ESTADISTICA 


Journal of the Inter American Statistical Institute 


Volume XIII, No. 47 Contents June 1955 


La Fuerza de Trabajo de México: Un Analisis de Su Estructura, Sus Caracterfsticas y Su 


Evolucién. .. Emitio UrrBE Romo 
Determinacién de la Poblacién Econémicamente Activa con Fines de Comparabilidad Inter- 


nacional... .. -Rogueé Garcfa-Frfias y O. ALEXANDER DE MORAES 
Lung Cancer in n the Twentieth Century . ; Hasert L. Dunn 
Proyecto para el Desarrollo de la Estadistica Sanitaria de Venezuela 


WirtiamM C. JAMes y ELENA DE OCHOA 
Normas Internacionales para las Estadisticas de Educacién y de Cultura 


B. A. Lrv y J. TENA-ARTIGAS 
Life Table Studies in Brazil—Estudos sébre a Longevidade no Brasil 


T. N. E. Greviite £ NEtson Luis pE ArAtyo MORAES 
Andlisis Econémico del Presupuesto de Venezuela para el Afio Fiscal 1954-55 


Dreco Mapero LEIvA 

La “Confidencialidad’”” como Recurso para Mejorar las Estadisticas Vitales y Sanitarias 
HERNAN RoMERO Y JERJES VILDOSOLA 

Concepts and Methods Used in the Current Labor Force Statistics Prepared by the U. S. 
Bureau of the Census (1954) U. S. Bureau of the Census 
Estudio Preliminar de un Plan Coman de Trabajo para el Servicio Estadistico de] Seguro 


Social Comité Interamericano de Seguridad Social 
Institute Affairs. Statistical News. Publications. 


Published quarterly Annual subscription price $3.00 (U. S.) 


INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D.C. 


THE AMERICAN STATISTICAL ASSOCIATION 


announces the publication of two new monographs: 


Statistical Problems of the Kinsey Report 


by CocHRAN, MOSTELLER and Tukey. The evaluation of the statistical methodology used by 
Kinsey and his associates in their first volume. This study was requested by the Committee 
for Research in the Problems of Sex of the Nationa] Research Council, which is sponsoring 
Dr. Kinsey’s work. The contents include Statistical Problems of the Kinsey Report, Dis- 
cussion of comments by Selected Technical Reviewers, Comparison with Other Studies, Pro- 
posed Further Work, Probability Sampling Considerations, The Interview and The Office, 
Desirable Accuracy, Principles of Sampling. 

The monograph contains 331 pages, plus a foreword, preface and index; bound in blue 
buckram; $3.00 to ASA members; $5.00 to others. 


Proceedings of the Business and Economics Statistics Section. 


The papers given at the sessions sponsored by the Business and Economics Statistics Section 
at the Annual Meeting of the American Statistical Association in Montreal in September 
1954. This volume contains papers on Pension Funds, Business Outlook, International Pay- 
ments, Consumer Survey Data, Forecasting, Employment and Unemployment Statistics, 
Stock Market, Government Statistics, Measurement of Saving and Investment, Mobilization, 
Productivity. 
Approximately 250 pages, paper bound. Price $2.00 to ASA members; $3.00 to others. 

Copies may be ordered directly from the AMERICAN STATISTICAL ASSOCIATION, 1108 Sixteenth 
Street, N. W., Washington 6, D. C. 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 23, No. 4 - October, 1955 


Survey of Economic Forecasting Techniques 
Note on the Non-existence of the Social Welfare Function 
Correlation and Regression Estimates when the Data are Ratios 
A Comparison of Treatments of a Duopoly Problem (Part II) 
ote The Stability of Technical Coefficients 
<asueceseeseesss+,.-Production Functions and British Coal Mining 
.. Note on an Inventory Problem Discussed by Modigliani and Hohn 
On Extrema with Side Conditions 
Boox Reviews 


The Growth and Fluctuation of the British Economy 1790-1850 (Arthur D. Gayer, W. W. Rostow, and Anna 
Jacobson Schwartz). Review by E. J. M. Buckatzsch 


Sone wetenal Income and Product 1940-48 (Abram Bergson and Hans Heymann, Jr.). Review by Maurice 
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